—Content-based document image retrieval is a new and promising research area. Without OCR, document indexing directly based on image content is more general and convenient. Howev...
A semantic-centric approach to information visualization is described in this paper. In contrast to the conventional document-centric approaches, the semantic-centric approach foc...
Query-independent features (also called document priors), such as the number of incoming links to a document, its Page-Rank, or the type of its associated URL, have been successfu...
Coreferencing entities across documents in a large corpus enables advanced document understanding tasks such as question answering. This paper presents a novel cross document core...
Jian Huang 0002, Sarah M. Taylor, Jonathan L. Smit...
A vast amount of documents in the Web have duplicates, which is a challenge for developing efficient methods that would compute clusters of similar documents. In this paper we use ...