Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summariza...
Text line segmentation in unconstrained handwritten documents remains a challenge because handwritten text lines are multi-skewed and not obviously separated. This paper presents ...
In this paper we propose a probabilistic model for online document clustering. We use non-parametric Dirichlet process prior to model the growing number of clusters, and use a pri...
Both Topic Maps and RDF are popular semantic web standards designed for machine processing of web documents. Since these representations were originally created for different purpo...
We present a non-traditional retrieval problem we call subtopic retrieval. The subtopic retrieval problem is concerned with finding documents that cover many different subtopics ...
ChengXiang Zhai, William W. Cohen, John D. Laffert...