Sciweavers

115 search results - page 1 / 23
» Training Data Cleaning for Text Classification
Sort
View
ICTIR
2009
Springer
13 years 2 months ago
Training Data Cleaning for Text Classification
Abstract. In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing t...
Andrea Esuli, Fabrizio Sebastiani
LREC
2008
108views Education» more  LREC 2008»
13 years 6 months ago
A Lightweight and Efficient Tool for Cleaning Web Pages
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Stefan Evert
EMNLP
2010
13 years 3 months ago
Negative Training Data Can be Harmful to Text Classification
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...
Xiaoli Li, Bing Liu, See-Kiong Ng
IJDLS
2010
108views more  IJDLS 2010»
13 years 2 months ago
Sampling the Web as Training Data for Text Classification
Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always b...
Wei-Yen Day, Chun-Yi Chi, Ruey-Cheng Chen, Pu-Jen ...
SDM
2008
SIAM
133views Data Mining» more  SDM 2008»
13 years 6 months ago
Semantic Smoothing for Bayesian Text Classification with Small Training Data
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu