Relying on the idea that back-of-the-book indexes are traditional devices for navigation through large documents, we have developed a method to build a hypertextual network that h...
Modern search engines are expected to make documents searchable shortly after they appear on the ever changing Web. To satisfy this requirement, the Web is frequently crawled. Due...
Almost all document analysis approaches need to perform a global analysis of the page orientation as a separate process at an early stage. It would be preferable to estimate the o...
Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the "ad-hoc&quo...
In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based o...