In this paper, we present an automated, quantitative, knowledge-poor method to evaluate the randomness of a collection of documents (corpus), with respect to a number of biased pa...
To tackle the problem of presenting a large number of options in spoken dialogue systems, we identify compelling options based on a model of user preferences, and present tradeoff...
Faced with the problem of annotation errors in part-of-speech (POS) annotated corpora, we develop a method for automatically correcting such errors. Building on top of a successfu...
Probabilistic Latent Semantic Analysis (PLSA) models have been shown to provide a better model for capturing polysemy and synonymy than Latent Semantic Analysis (LSA). However, th...
In this paper, we propose an approach for identifying curatable articles from a large document set. This system considers three parts of an article (title ract, MeSH terms, and ca...