We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual struc...
Fabio Fumarola, Tim Weninger, Rick Barber, Donato ...
Dependable software systems are difficult to develop because developers must understand and address several interdependent and pervasive dependability concerns. Features that addr...
The integration of heterogenous data sources is a crucial step for the upcoming semantic web – if existing information is not integrated, where will the data come from that the s...
Recently, the opportunity of extracting structured data from the Web has been identified by a number of research projects. One such example is that millions of relational-style H...
Daisy Zhe Wang, Xin Luna Dong, Anish Das Sarma, Mi...
We present in this paper ObjectRunner, a system for extracting, integrating and querying structured data from the Web. Our system harvests real-world items from template-based HTM...