Sciweavers

WEBDB
2010
Springer

WikiAnalytics: Disambiguation of Keyword Search Results on Highly Heterogeneous Structured Data

13 years 9 months ago
WikiAnalytics: Disambiguation of Keyword Search Results on Highly Heterogeneous Structured Data
Wikipedia infoboxes is an example of a seemingly structured, yet extraordinarily heterogeneous dataset, where any given record has only a tiny fraction of all possible fields. Such data cannot be queried using traditional means without a massive a priori integration effort, since even for a simple request the result values span many record types and fields. On the other hand, the solutions based on keyword search are too imprecise to capture user’s intent. To address these limitations, we propose a system, referred to herein as WIKIANALYTICS, that utilizes a novel search paradigm in order to derive tables of precise and complete results from Wikipedia infobox records. The user starts with a keyword search query that finds a superset of the result records, and then browses clusters of records deciding which are and are not relevant. WIKIANALYTICS uses three categories of clustering features based on record types, fields, and values that matched the query keywords, respectively. S...
Andrey Balmin, Emiran Curtmola
Added 11 Jul 2010
Updated 11 Jul 2010
Type Conference
Year 2010
Where WEBDB
Authors Andrey Balmin, Emiran Curtmola
Comments (0)