Sciweavers

FLAIRS
2004

Automatic Generation of Background Text to Aid Classification

13 years 5 months ago
Automatic Generation of Background Text to Aid Classification
We illustrate that Web searches can often be utilized to generate background text for use with text classification. This is the case because there are frequently many pages on the World Wide Web that are relevant to particular text classification tasks. We show that an automatic method of creation of a secondary corpus of unlabeled but related documents can help decrease error rates in text categorization problems. Furthermore, if the test corpus is known, this related set of information can be tailored to match the particular categorization problem in a transductive approach. Our system uses WHIRL, a tool that combines database functionalities with techniques from the information retrieval literature. When there is a limited number of training examples, or the process of obtaining training examples is expensive or difficult, this method can be especially useful.
Sarah Zelikovitz, Robert Hafner
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where FLAIRS
Authors Sarah Zelikovitz, Robert Hafner
Comments (0)