Sciweavers

ANLP
1997

A Maximum Entropy Approach to Identifying Sentence Boundaries

13 years 5 months ago
A Maximum Entropy Approach to Identifying Sentence Boundaries
We present a trainable model for identifying sentence boundaries in raw text. Given a corpus annotated with sentence boundaries, our model learns to classify each occurrence of., ?, and / as either a valid or invalid sentence boundary. The training procedure requires no hand-crafted rules, lexica, part-of-speech tags, or domain-specific information. The model can therefore be trained easily on any genre of English, and should be trainable on any other Romanalphabet language. Performance is comparable to or better than the performance of similar systems, but we emphasize the simplicity of retraining for new domains.
Jeffrey C. Reynar, Adwait Ratnaparkhi
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1997
Where ANLP
Authors Jeffrey C. Reynar, Adwait Ratnaparkhi
Comments (0)