Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination ...
: XML is unique in its very broad acceptance throughout both the document engineering and data processing community. This creates a unique opportunity for unifying the traditionall...
Andrea R. de Andrade, Ethan V. Munson, Maria da Gr...
The growing dependence of modern society on the Web as a vital source of information and communication has become inevitable. However, the Web has become an ideal channel for vari...
This paper addresses the problem of extracting information from textual documents, either normal documents or web pages. A new approach for extracting complicate information from ...
Luo Xiao, Dieter Wissmann, Michael Brown, Stefan J...