Sciweavers

ICDAR
1999
IEEE

Segmenting Documents using Multiple Lexical Features

13 years 8 months ago
Segmenting Documents using Multiple Lexical Features
A method is presented for segmenting documents into conceptually related areas. Determining the equivalence of text is often based on the number of word repetitions. This approach is unsuitable for detecting short segments because terms tend not to be repeated across just a few sentences. In this paper we investigate the contribution of two other lexical features to find related words: collocation and relation weights (which identify semantic relations). An experiment was conducted on a set of test data with known topic changes; performances of the three features were independently compared. A combination of all features was the most reliable indicator of a topic change. In another experiment, CNN news summaries were segmented into their individual news stories. Precision and recall rates of around 90% are reported for news story boundary detection.
Amanda C. Jobbins, Lindsay J. Evett
Added 03 Aug 2010
Updated 03 Aug 2010
Type Conference
Year 1999
Where ICDAR
Authors Amanda C. Jobbins, Lindsay J. Evett
Comments (0)