A large number of web sites publish pages containing structured information about recognizable concepts, but these data are only partially used by current applications. Although s...
Paolo Papotti, Valter Crescenzi, Paolo Merialdo, M...
We present an experimental framework for Entity Mention Detection in which two different classifiers are combined to exploit Data Redundancy attained through the annotation of a l...
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...
Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies--attempting to do data record detection and attribute labeling in two se...