Sciweavers

483 search results - page 23 / 97
» Sampling the Web as Training Data for Text Classification
Sort
View
INTERSPEECH
2010
14 years 10 months ago
Topic and style-adapted language modeling for Thai broadcast news ASR
The amount of available Thai broadcast news transcribed text for training a language model is still very limited, comparing to other major languages. Since the construction of a b...
Markpong Jongtaveesataporn, Sadaoki Furui
ISMIR
2004
Springer
156views Music» more  ISMIR 2004»
15 years 8 months ago
Artist Classification with Web-Based Data
Manifold approaches exist for organization of music by genre and/or style. In this paper we propose the use of text categorization techniques to classify artists present on the In...
Peter Knees, Elias Pampalk, Gerhard Widmer
WWW
2009
ACM
16 years 3 months ago
Combining anchor text categorization and graph analysis for paid link detection
In order to artificially boost the rank of commercial pages in search engine results, search engine optimizers pay for links to these pages on other websites. Identifying paid lin...
Kirill Nikolaev, Ekaterina Zudina, Andrey Gorshkov
SIGIR
2008
ACM
15 years 3 months ago
Topic-bridged PLSA for cross-domain text classification
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain ...
Gui-Rong Xue, Wenyuan Dai, Qiang Yang, Yong Yu
NIPS
2008
15 years 4 months ago
Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization
The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes quest...
Liu Yang, Rong Jin, Rahul Sukthankar