Search Sciweavers | Sciweavers

101

NIPS
2008

143views Information Technology» more NIPS 2008»

Semi-supervised Learning with Weakly-Related Unlabeled Data: Towards Better Text Categorization

14 years 11 months ago

The cluster assumption is exploited by most semi-supervised learning (SSL) methods. However, if the unlabeled data is merely weakly related to the target classes, it becomes quest...

Liu Yang, Rong Jin, Rahul Sukthankar

claim paper

Read More »

89

click to vote

IDEAS
2008
IEEE

80views Database» more IDEAS 2008»

Improved count suffix trees for natural language data

15 years 4 months ago

Download dbis.ipd.uni-karlsruhe.de

With more and more natural language text stored in databases, handling respective query predicates becomes very important. Optimizing queries with predicates includes (sub)string ...

Guido Sautter, Cristina Abba, Klemens Böhm

claim paper

Read More »

87

click to vote

HPDC
2010
IEEE

193views Distributed And Parallel Com...» more HPDC 2010»

Reshaping text data for efficient processing on Amazon EC2

14 years 11 months ago

Download dsl.cs.uchicago.edu

Text analysis tools are nowadays required to process increasingly large corpora which are often organized as small files (abstracts, news articles, etc). Cloud computing offers a ...

Gabriela Turcu, Ian T. Foster, Svetlozar Nestorov

claim paper

Read More »

89

click to vote

CSL
2006
Springer

143views Automated Reasoning» more CSL 2006»

A study in machine learning from imbalanced data for sentence boundary detection in speech

14 years 10 months ago

Download www.hlt.utdallas.edu

Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have const...

Yang Liu, Nitesh V. Chawla, Mary P. Harper, Elizab...

claim paper

Read More »

79

click to vote

TASLP
2008

143views more TASLP 2008»

Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarizatio

14 years 10 months ago

Download www.worshipersam.net

Many current state-of-the-art speaker diarization systems exploit agglomerative hierarchical clustering (AHC) as their speaker clustering strategy, due to its simple processing str...

K. J. Han, S. Kim, S. S. Narayanan

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers