Sciweavers

IWPT
2001

Parsing the CHILDES Database: Methodology and Lessons Learned

13 years 5 months ago
Parsing the CHILDES Database: Methodology and Lessons Learned
This paper discusses the process of parsing adult utterances directed to a child, in an effort to produce a syntactically annotated corpus of the verbal input to a human language learner. In parsing the Eve corpus of the CHILDES database, we encountered several challenges relating to parser coverage and ambiguity, for which we describe solutions that result in a system capable of analyzing almost 80% of the adult utterances in the corpus correctly. We describe characteristics of the language in the corpus that make this task unique, and present specific ways to deal with the analysis of this type of language. We discuss each step of the corpus analysis in detail, focusing on how selected techniques, such as part-of-speech tagging, rule-based robust parsing and statistical disambiguation, affect the trade-off between coverage and accuracy. Finally, we present a detailed evaluation of the performance of our system. A parsed corpus resulting from the research described in this paper is a...
Kenji Sagae, Alon Lavie, Brian MacWhinney
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where IWPT
Authors Kenji Sagae, Alon Lavie, Brian MacWhinney
Comments (0)