Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference li...
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
This paper investigates the role of ontologies as a central part of an architecture to repurpose existing material from the web. A prototype system called ArtEquAKT is presented, ...
Mark J. Weal, Harith Alani, Sanghee Kim, Paul H. L...
As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) ? t...
Many real–world information needs are naturally formulated as queries with temporal constraints. However, the structured temporal background information needed to support such c...
Steven Schockaert, Martine De Cock, Etienne E. Ker...