Sciweavers

64 search results - page 8 / 13
» Estimation of English and non-English Language Use on the WW...
Sort
View
92
Voted
NIPS
2004
14 years 11 months ago
A Probabilistic Model for Online Document Clustering with Application to Novelty Detection
In this paper we propose a probabilistic model for online document clustering. We use non-parametric Dirichlet process prior to model the growing number of clusters, and use a pri...
Jian Zhang 0003, Zoubin Ghahramani, Yiming Yang
IDEAS
2008
IEEE
80views Database» more  IDEAS 2008»
15 years 4 months ago
Improved count suffix trees for natural language data
With more and more natural language text stored in databases, handling respective query predicates becomes very important. Optimizing queries with predicates includes (sub)string ...
Guido Sautter, Cristina Abba, Klemens Böhm
IJDAR
2007
106views more  IJDAR 2007»
14 years 9 months ago
Investigation and modeling of the structure of texting language
Language usage over computer mediated discourses, like chats, emails and SMS texts, significantly differs from the standard form of the language. An urge towards shorter message l...
Monojit Choudhury, Rahul Saraf, Vijit Jain, Animes...
EMNLP
2009
14 years 7 months ago
Discriminative Corpus Weight Estimation for Machine Translation
Current statistical machine translation (SMT) systems are trained on sentencealigned and word-aligned parallel text collected from various sources. Translation model parameters ar...
Spyros Matsoukas, Antti-Veikko I. Rosti, Bing Zhan...
77
Voted
NLPRS
2001
Springer
15 years 2 months ago
A Simple Closed-Class/Open-Class Factorization for Improved Language Modeling
We describe a simple improvement to ngram language models where we estimate the distribution over closed-class (function) words separately from the conditional distribution of ope...
Fuchun Peng, Dale Schuurmans