OCELOT: a system for summarizing Web pages

11 years 9 months ago
OCELOT: a system for summarizing Web pages
Abstract We introduce OCELOT, a prototype system for automatically generating the “gist” of a web page by summarizing it. Although most text summarization research to date has focused on the task of news articles, web pages are quite different in both structure and content. Instead of coherent text with a well-defined discourse structure, they are more often likely to be a chaotic jumble of phrases, links, graphics and formatting commands. Such text provides little foothold for extractive summarization techniques, which attempt to generate a summary of a document by excerpting a contiguous, coherent span of text from it. This paper builds upon recent work in nonextractive summarization, producing the gist of a web page by “translating” it into a more concise representation rather than attempting to extract a text span verbatim. OCELOT uses probabilistic models to guide it in selecting and ordering words into a gist. This paper describes a technique for learning these models au...
Adam L. Berger, Vibhu O. Mittal
Added 01 Aug 2010
Updated 01 Aug 2010
Type Conference
Year 2000
Authors Adam L. Berger, Vibhu O. Mittal
Comments (0)