Sciweavers

483 search results - page 15 / 97
» Sampling the Web as Training Data for Text Classification
Sort
View
ICML
2005
IEEE
16 years 3 months ago
Hierarchical Dirichlet model for document classification
The proliferation of text documents on the web as well as within institutions necessitates their convenient organization to enable efficient retrieval of information. Although tex...
Sriharsha Veeramachaneni, Diego Sona, Paolo Avesan...
EACL
2006
ACL Anthology
15 years 4 months ago
Web Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
Vinci Liu, James R. Curran
EMNLP
2009
15 years 24 days ago
Semi-Supervised Learning for Semantic Relation Classification using Stratified Sampling Strategy
This paper presents a new approach to selecting the initial seed set using stratified sampling strategy in bootstrapping-based semi-supervised learning for semantic relation class...
Longhua Qian, Guodong Zhou, Fang Kong, Qiaoming Zh...
KDD
2004
ACM
160views Data Mining» more  KDD 2004»
16 years 3 months ago
Boosting for Text Classification with Semantic Features
Abstract. Current text classification systems typically use term stems for representing document content. Semantic Web technologies allow the usage of features on a higher semantic...
Stephan Bloehdorn, Andreas Hotho
ML
2000
ACM
124views Machine Learning» more  ML 2000»
15 years 2 months ago
Text Classification from Labeled and Unlabeled Documents using EM
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. ...
Kamal Nigam, Andrew McCallum, Sebastian Thrun, Tom...