Sciweavers

975 search results - page 44 / 195
» On supporting effective web extraction
Sort
View
ICDAR
2003
IEEE
15 years 3 months ago
Reference Line Extraction from Form Documents with Complicated Backgrounds
Form document analysis is one of the most essential tasks in document analysis and recognition. One of the most fundamental and crucial tasks is the extraction of the reference li...
Dihua Xi, Seong-Whan Lee
PVLDB
2008
141views more  PVLDB 2008»
14 years 9 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
IJMMS
2007
107views more  IJMMS 2007»
14 years 9 months ago
Ontologies as facilitators for repurposing web documents
This paper investigates the role of ontologies as a central part of an architecture to repurpose existing material from the web. A prototype system called ArtEquAKT is presented, ...
Mark J. Weal, Harith Alani, Sanghee Kim, Paul H. L...
SIGMOD
2008
ACM
134views Database» more  SIGMOD 2008»
15 years 10 months ago
SystemT: a system for declarative information extraction
As applications within and outside the enterprise encounter increasing volumes of unstructured data, there has been renewed interest in the area of information extraction (IE) ? t...
Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghava...
IAT
2008
IEEE
15 years 4 months ago
Acquiring Vague Temporal Information from the Web
Many real–world information needs are naturally formulated as queries with temporal constraints. However, the structured temporal background information needed to support such c...
Steven Schockaert, Martine De Cock, Etienne E. Ker...