Sciweavers

ACL
2009
13 years 2 months ago
A Succinct N-gram Language Model
Efficient processing of tera-scale text data is an important research topic. This paper proposes lossless compression of Ngram language models based on LOUDS, a succinct data stru...
Taro Watanabe, Hajime Tsukada, Hideki Isozaki
ACL
2009
13 years 2 months ago
Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our...
Daichi Mochihashi, Takeshi Yamada, Naonori Ueda
SIGDIAL
2010
13 years 2 months ago
The Effects of Discourse Connectives Prediction on Implicit Discourse Relation Recognition
Implicit discourse relation recognition is difficult due to the absence of explicit discourse connectives between arbitrary spans of text. In this paper, we use language models to...
Zhi Min Zhou, Man Lan, Zhengyu Niu, Yu Xu, Jian Su
NAACL
2010
13 years 2 months ago
Using Mostly Native Data to Correct Errors in Learners' Writing
We present results from a range of experiments on article and preposition error correction for non-native speakers of English. We first compare a language model and errorspecific ...
Michael Gamon
ICTAI
2010
IEEE
13 years 2 months ago
A Semantic Similarity Language Model to Improve Automatic Image Annotation
In recent years, with the rapid proliferation of digital images, the need to search and retrieve the images accurately, efficiently, and conveniently is becoming more acute. Automa...
Tianxia Gong, Shimiao Li, Chew Lim Tan
ACL
2010
13 years 2 months ago
Authorship Attribution Using Probabilistic Context-Free Grammars
In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach in...
Sindhu Raghavan, Adriana Kovashka, Raymond J. Moon...
TSD
2010
Springer
13 years 2 months ago
Recovery of Rare Words in Lecture Speech
The vocabulary used in speech usually consists of two types of words: a limited set of common words, shared across multiple documents, and a virtually unlimited set of rare words, ...
Stefan Kombrink, Mirko Hannemann, Lukas Burget, Hy...
TALIP
2002
108views more  TALIP 2002»
13 years 4 months ago
Toward a unified approach to statistical language modeling for Chinese
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...
Jianfeng Gao, Joshua Goodman, Mingjing Li, Kai-Fu ...
SIGIR
2002
ACM
13 years 4 months ago
Language model for IR using collection information
In this paper, we explored how to use meta-data information in information retrieval task. We presented a new language model that is able to take advantage of the category informa...
Rong Jin, Luo Si, Alexander G. Hauptmann, James P....
CORR
2000
Springer
67views Education» more  CORR 2000»
13 years 4 months ago
Recognition Performance of a Structured Language Model
A new language model for speech recognition inspired by linguistic analysis is presented. The model develops hidden hierarchical structure incrementally and uses it to extract mea...
Ciprian Chelba, Frederick Jelinek