Abstract. Automatic structuring is one means to ease access to document collections, be it for organization or for exploration. Of even greater help would be a presentation that ad...
We describe a methodology for retrieving document images from large extremely diverse collections. First we perform content extraction, that is the location and measurement of reg...
This paper describes an approach to attention based layout segmentation using general principles of the human visual perception to achieve this goal. The text is considered as tex...
CT This paper explores several methods for visualizing the thematic content of large document collections. As opposed to traditional query-driven document retrieval, these methods ...
Nancy Miller, Elizabeth G. Hetzler, Grant Nakamura...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...