This paper describes a newly created text corpus of news articles that has been annotated for cross-document co-reference. Being able to robustly resolve references to entities ac...
David Day, Janet Hitzeman, Michael L. Wick, Keith ...
Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover top...
Daniel David Walker, William B. Lund, Eric K. Ring...
We propose an approach to restore severely degraded
document images using a probabilistic context model. Un-
like traditional approaches that use previously learned
prior models...
Jyotirmoy Banerjee, Anoop M. Namboodiri, C. V. Jaw...
Research on linear text segmentation has been an on-going focus in NLP for the last decade, and it has great potential for a wide range of applications such as document summarizati...
Jingbo Zhu, Na Ye, Xinzhi Chang, Wenliang Chen, Be...
As access to hypermedia documents becomes generally available, it becomes increasingly important to understand how casual users search for information. We have studied search patt...