Unexpected results in automatic list extraction on the web

12 years 11 months ago

Download www.sigkdd.org

The discovery and extraction of general lists on the Web continues to be an important problem facing the Web mining community. There have been numerous studies that claim to automatically extract structured data (i.e. lists, record sets, tables, etc.) from the Web for various purposes. Our own recent experiences have shown that the list-finding methods used as part of these larger frameworks do not generalize well and therefore ought to be reevaluated. This paper briefly describes some of the current approaches, and tests them on various list-pages. Based on our findings, we conclude that analyzing a Web page's DOM-structure is not sufficient for the general list finding task.

Tim Weninger, Fabio Fumarola, Rick Barber, Jiawei

Real-time Traffic

General List | Important Problem | Information Technology | SIGKDD 2010 | Web Mining Community |

claim paper

» HyLiEn a hybrid approach to general list extraction on the web

» Bringing order to the Web automatically categorizing search results

» Mining Web Snippets to Answer List Questions

» Automatic Acquisition of Usage Information for Language Resources

» Automatic wrapper maintenance for semistructured web sources using results from previous q...

» Methods for DomainIndependent Information Extraction from the Web An Experimental Comparis...

» Automatic Recommendation for Online Users Using Web Usage Mining

» A twophase rule generation and optimization approach for wrapper generation

Post Info
More Details (n/a)

Added	21 May 2011
Updated	21 May 2011
Type	Journal
Year	2010
Where	SIGKDD
Authors	Tim Weninger, Fabio Fumarola, Rick Barber, Jiawei Han, Donato Malerba

Comments (0)

Sciweavers

Unexpected results in automatic list extraction on the web

General List | Important Problem | Information Technology | SIGKDD 2010 | Web Mining Community |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers