The role of documents vs. queries in extracting class attributes from text

15 years 9 months ago

Download www.cs.jhu.edu

Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual information extraction. The diﬀerences are quantiﬁed as part of a large-scale study on extracting prominent attributes or quantiﬁable properties of classes (e.g., top speed, price and fuel consumption for CarModel) from unstructured text. In a head-to-head qualitative comparison, a lightweight extraction method produces class attributes that are 45% more accurate on average, when acquired from query logs rather than Web documents. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing; I.2.7 [Artiﬁcial Intelligence]: Natural Language Processing; I.2.6 [Artiﬁcial Intelligence]: Learning; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Experimentation Keywords Knowledge acquisit...

Marius Pasca, Benjamin Van Durme, Nikesh Garera

Real-time Traffic