In recent years, many research systems have been proposed to perform data extraction and automation tasks on Web sources. Since most of today’s Web sources are “human-readable...
Abstract: By means of RDFa it is possible to embed semantic meaning into standard XHTML web pages. Using the meaning, we provide content-sensitive user interfaces for web pages int...
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
This work focuses on characterizing information about Web resources and server responses that is relevant to Web caching. The approach is to study a set of URLs at a variety of si...