Sciweavers

244 search results - page 7 / 49
» From HTML documents to web tables and rules
Sort
View
84
Voted
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
15 years 6 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
WWW
2007
ACM
16 years 13 days ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
DAS
2004
Springer
15 years 5 months ago
Rule-Based Structural Analysis of Web Pages
Structural analysis of web pages has been proposed several times and for a number of reasons and purposes, such as the re-flowing of standard web pages to fit a smaller PDA screen....
Fabio Vitali, Angelo Di Iorio, Elisa Ventura Campo...
COMAD
2009
15 years 26 days ago
Querying for relations from the semi-structured Web
We present a class of web queries whose result is a multi-column relation instead of a collection of unstructured documents as in standard web search. The user specifies the query...
Sunita Sarawagi
116
Voted
TREC
2000
15 years 1 months ago
Information Space Based on HTML Structure
The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...
Gregory B. Newby