Classifying Sentence-Based Summaries of Web Documents

12 years 6 months ago
Classifying Sentence-Based Summaries of Web Documents
Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified Web documents to identify the ones that satisfy their information needs. In solving this problem, we first introduce CorSum, an extractive single-document summarization approach, which is simple and effective in performing the summarization task, since it only relies on word similarity to generate high-quality summaries. Hereafter, we train a Na¨ıve Bayes classifier on CorSum-generated summaries and verify the classification accuracy using the summaries and the speed-up during the process. Experimental results on the DUC-2002 and 20 Newsgroups datasets show that CorSum outperforms other extractive summarization methods, and classification time is significantly reduced using CorSum-generated summaries w...
Maria Soledad Pera, Yiu-Kai Ng
Added 24 May 2010
Updated 24 May 2010
Type Conference
Year 2009
Authors Maria Soledad Pera, Yiu-Kai Ng
Comments (0)