We describe a system for the retrieval on the basis of layout similarity of document images belonging to collections stored in digital libraries. Layout regions are extracted and ...
Abstract. In this paper, we present a logical representation for form documents to be used for identification and retrieval. A hierarchical structure is proposed to represent the s...
In searching a repository of business documents, a task of interest is that of using a query signature image to retrieve from a database, other signatures matching the query. The ...
Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Ha...
We present a document expansion approach that uses Conditional Random Field (CRF) segmentation to automatically extract salient phrases from ad titles. We then supplement the ad d...
This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...