Search Sciweavers | Sciweavers

244 search results - page 7 / 49

» From HTML documents to web tables and rules

141

click to vote

DEXAW
2008
IEEE

123views Database» more DEXAW 2008»

Text Extraction from the Web via Text-to-Tag Ratio

16 years 16 days ago

Download www.uni-weimar.de

– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...

Tim Weninger, William H. Hsu

claim paper

Read More »

189

click to vote

WWW
2007
ACM

144views Internet Technology» more WWW 2007»

Towards domain-independent information extraction from web tables

16 years 6 months ago

Download www2007.org

Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...

Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...

claim paper

Read More »

157

click to vote

DAS
2004
Springer

78views Document Analysis» more DAS 2004»

Rule-Based Structural Analysis of Web Pages

15 years 11 months ago

Download tesi.fabio.web.cs.unibo.it

Structural analysis of web pages has been proposed several times and for a number of reasons and purposes, such as the re-flowing of standard web pages to fit a smaller PDA screen....

Fabio Vitali, Angelo Di Iorio, Elisa Ventura Campo...

claim paper

Read More »

174

click to vote

COMAD
2009

142views Knowledge Management» more COMAD 2009»

Querying for relations from the semi-structured Web

15 years 7 months ago

Download www.cse.iitb.ac.in

We present a class of web queries whose result is a multi-column relation instead of a collection of unstructured documents as in standard web search. The user specifies the query...

Sunita Sarawagi

claim paper

Read More »

199

click to vote

TREC
2000

101views Information Technology» more TREC 2000»

Information Space Based on HTML Structure

15 years 7 months ago

Download trec.nist.gov

The main goal for the Information Space system for TREC9 was early precision. To facilitate this, an emphasis was placed on seeking matches from only the TITLE, H1, H2 and H3 tags...

Gregory B. Newby

claim paper

Read More »

« Prev « First page 7 / 49 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers