Sciweavers

577 search results - page 32 / 116
» Improved Text Generation Using N-gram Statistics
Sort
View
COLING
1996
14 years 11 months ago
The Automatic Extraction of Open Compounds from Text Corpora
This paper describes a new method for extracting open compounds (uninterrupted sequences of words) from text corpora of languages, such as Thai, Japanese and Korea that exhibit un...
Virach Sornlertlamvanich, Hozumi Tanaka
SIGIR
2002
ACM
14 years 9 months ago
Risk minimization and language modeling in text retrieval dissertation abstract
tion Abstract ChengXiang Zhai (Advisor: John Lafferty) Language Technologies Institute School of Computer Science Carnegie Mellon University With the dramatic increase in online in...
ChengXiang Zhai
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
15 years 10 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee
CICLING
2007
Springer
15 years 4 months ago
Morphological Disambiguation of Turkish Text with Perceptron Algorithm
Abstract. This paper describes the application of the perceptron algorithm to the morphological disambiguation of Turkish text. Turkish has a productive derivational morphology. Du...
Hasim Sak, Tunga Güngör, Murat Saraclar
ICML
1997
IEEE
15 years 2 months ago
A Comparative Study on Feature Selection in Text Categorization
This paper is a comparative study of feature selection methods in statistical learning of text categorization. The focus is on aggressive dimensionality reduction. Five methods we...
Yiming Yang, Jan O. Pedersen