Web-scale knowledge extraction from semi-structured tables

12 years 6 months ago
Web-scale knowledge extraction from semi-structured tables
A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tables and attribute/value tables. We report the frequencies of these table types over a large analysis of the Web and propose open challenges for extracting from attribute/value tables semantic triples (knowledge). We then describe a solution to a key problem in extracting semantic triples: protagonist detection, i.e., finding the subject of the table that often is not present in the table itself. In 79% of our Web tables, our method finds the correct protagonist in its top three returned candidates. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning – knowledge acquisition. General Terms Algorithms, Experimentation, Measurement. Keywords Information extraction, structured data, web tables, classification.
Eric Crestan, Patrick Pantel
Added 18 Jul 2010
Updated 18 Jul 2010
Type Conference
Year 2010
Where WWW
Authors Eric Crestan, Patrick Pantel
Comments (0)