Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

125

IJDLS
2010

108views more IJDLS 2010»

Sampling the Web as Training Data for Text Classification

15 years 1 months ago

Sampling the Web as Training Data for Text Classification

Download irlab.csie.ntu.edu.tw

Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always be available to research workers. In this paper, we look into possibilities to automatically collect training data by sampling the Web with a set of given class names. The basic idea is to populate appropriate keywords and submit them as queries to search engines for acquiring training data. Two methods are presented in this study: One method is based on sampling the common concepts among the classes, and the other based on sampling the discriminative concepts for each class. A series of experiments were carried out independently on two different datasets, and the result shows that the proposed methods significantly improve classifier performance even without using manually labeled training data. Our strategy for

Wei-Yen Day, Chun-Yi Chi, Ruey-Cheng Chen, Pu-Jen

Real-time Traffic

Excessive Human Efforts | IJDLS 2010 | Quality Training Collection | Training |

claim paper

Related Content

» Author identification Using text sampling to handle the class imbalance problem

» CBC Clustering Based Text Classification Requiring Minimal Labeled Data

» Taking Advantage of the Web for Text Classification with Imbalanced Classes

» Classifying HighDimensional Text and Web Data Using Very Short Patterns

» Combining clustering and cotraining to enhance text classification using unlabelled data

» Text Sampling and ReSampling for Imbalanced Authorship Identification Cases

» A Comparison of Classification Techniques for Technical Text Passages

» Enhancing Training Data for Handwriting Recognition of Whiteboard Notes with Samples from ...

» Integrating Background Knowledge Into Text Classification

Post Info
More Details (n/a)

Added	05 Mar 2011
Updated	05 Mar 2011
Type	Journal
Year	2010
Where	IJDLS
Authors	Wei-Yen Day, Chun-Yi Chi, Ruey-Cheng Chen, Pu-Jen Cheng

Comments (0)