Sciweavers

483 search results - page 3 / 97
» Sampling the Web as Training Data for Text Classification
Sort
View
SIGMOD
2004
ACM
150views Database» more  SIGMOD 2004»
14 years 6 months ago
When one Sample is not Enough: Improving Text Database Selection Using Shrinkage
Database selection is an important step when searching over large numbers of distributed text databases. The database selection task relies on statistical summaries of the databas...
Panagiotis G. Ipeirotis, Luis Gravano
EMNLP
2010
13 years 3 months ago
Negative Training Data Can be Harmful to Text Classification
This paper studies the effects of training data on binary text classification and postulates that negative training data is not needed and may even be harmful for the task. Tradit...
Xiaoli Li, Bing Liu, See-Kiong Ng
ICTIR
2009
Springer
13 years 3 months ago
Training Data Cleaning for Text Classification
Abstract. In text classification (TC) and other tasks involving supervised learning, labelled data may be scarce or expensive to obtain; strategies are thus needed for maximizing t...
Andrea Esuli, Fabrizio Sebastiani
SDM
2008
SIAM
133views Data Mining» more  SDM 2008»
13 years 7 months ago
Semantic Smoothing for Bayesian Text Classification with Small Training Data
Bayesian text classifiers face a common issue which is referred to as data sparsity problem, especially when the size of training data is very small. The frequently used Laplacian...
Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu
DEXAW
2010
IEEE
190views Database» more  DEXAW 2010»
13 years 2 months ago
A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion Classification in Blogs
In the blogosphere, the amount of digital content is expanding and for search engines, new challenges have been imposed. Due to the changing information need, automatic methods are...
Elisabeth Lex, Andreas Juffinger, Michael Granitze...