Sciweavers

483 search results - page 6 / 97
» Sampling the Web as Training Data for Text Classification
Sort
View
NAACL
2003
14 years 11 months ago
A Web-Trained Extraction Summarization System
A serious bottleneck in the development of trainable text summarization systems is the shortage of training data. Constructing such data is a very tedious task, especially because...
Liang Zhou, Eduard H. Hovy
DRR
2009
14 years 7 months ago
Using synthetic data safely in classification
When is it safe to use synthetic data in supervised classification? Trainable classifier technologies require large representative training sets consisting of samples labeled with...
Jean Nonnemaker, Henry Baird
SIGIR
2005
ACM
15 years 3 months ago
Automatic web query classification using labeled and unlabeled training data
Accurate topical categorization of user queries allows for increased effectiveness, efficiency, and revenue potential in general-purpose web search systems. Such categorization be...
Steven M. Beitzel, Eric C. Jensen, Ophir Frieder, ...
KDD
2002
ACM
138views Data Mining» more  KDD 2002»
15 years 10 months ago
Learning to match and cluster large high-dimensional data sets for data integration
Part of the process of data integration is determining which sets of identifiers refer to the same real-world entities. In integrating databases found on the Web or obtained by us...
William W. Cohen, Jacob Richman
96
Voted
IPM
2002
106views more  IPM 2002»
14 years 9 months ago
A feature mining based approach for the classification of text documents into disjoint classes
This paper proposes a new approach for classifying text documents into two disjoint classes. The new approach is based on extracting patterns, in the form of two logical expressio...
Salvador Nieto Sánchez, Evangelos Triantaph...