Hostname: page-component-84b7d79bbc-g78kv Total loading time: 0 Render date: 2024-07-26T11:21:23.853Z Has data issue: false hasContentIssue false

Automated extraction of Tree-Adjoining Grammars from treebanks

Published online by Cambridge University Press:  22 December 2005

JOHN CHEN
Affiliation:
Microsoft Research Asia, No. 49 Zhichun Road, Haidian District, Beijing 100080, China e-mail: t-Johnc@microsoft.com
SRINIVAS BANGALORE
Affiliation:
AT&T Labs–Research, P.O. Box 971, 180 Park Avenue, Florham Park, NJ 07932, USA e-mail: srini@research.att.com
K. VIJAY-SHANKER
Affiliation:
Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA e-mail: vijay@cis.udel.edu

Abstract

There has been a contemporary surge of interest in the application of stochastic models of parsing. The use of tree-adjoining grammar (TAG) in this domain has been relatively limited due in part to the unavailability, until recently, of large-scale corpora hand-annotated with TAG structures. Our goals are to develop inexpensive means of generating such corpora and to demonstrate their applicability to stochastic modeling. We present a method for automatically extracting a linguistically plausible TAG from the Penn Treebank. Furthermore, we also introduce labor-inexpensive methods for inducing higher-level organization of TAGs. Empirically, we perform an evaluation of various automatically extracted TAGs and also demonstrate how our induced higher-level organization of TAGs can be used for smoothing stochastic TAG models.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)