This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final re...
Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have...
Trausti T. Kristjansson, Aron Culotta, Paul A. Vio...
In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. Aft...
Information Extraction (IE) from text /web documents has become an important application area of AI. As the number of web sites and documents has grown dramatically, the users need...
The ability to find tables and extract information from them is a necessary component of many information retrieval tasks. Documents often contain tables in order to communicate d...