Nowadays, information is primarily searched on the WWW. From a user perspective, the readability is an important criterion for measuring the accessibility and thereby the quality ...
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final re...
Information Extraction methods can be used to automatically "fill-in" database forms from unstructured data such as Web documents or email. State-of-the-art methods have...
Trausti T. Kristjansson, Aron Culotta, Paul A. Vio...
In this paper we discuss the possible application of new concepts in web content extraction: utility assessment, utility annealing, and dynamic aggregated document generation. Aft...
Previous work on domain specific search services in the area of depressive illness has documented the significant human cost required to setup and maintain closed-crawl parameters....
Thanh Tin Tang, David Hawking, Nick Craswell, Rame...