Sciweavers

NIPS
2001

Natural Language Grammar Induction Using a Constituent-Context Model

13 years 5 months ago
Natural Language Grammar Induction Using a Constituent-Context Model
This paper presents a novel approach to the unsupervised learning of syntactic analyses of natural language text. Most previous work has focused on maximizing likelihood according to generative PCFG models. In contrast, we employ a simpler probabilistic model over trees based directly on constituent identity and linear context, and use an EM-like iterative procedure to induce structure. This method produces much higher quality analyses, giving the best published results on the ATIS dataset. 1 Overview To enable a wide range of subsequent tasks, human language sentences are standardly given tree-structure analyses, wherein the nodes in a tree dominate contiguous spans of words called constituents, as in figure 1(a). Constituents are the linguistically coherent units in the sentence, and are usually labeled with a constituent category, such as noun phrase (NP) or verb phrase (VP). An aim of grammar induction systems is to figure out, given just the sentences in a corpus S, what tree str...
Dan Klein, Christopher D. Manning
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where NIPS
Authors Dan Klein, Christopher D. Manning
Comments (0)