Sciweavers

CIKM
2004
Springer

Organizing structured web sources by query schemas: a clustering approach

13 years 9 months ago
Organizing structured web sources by query schemas: a clustering approach
In the recent years, the Web has been rapidly “deepened” with the prevalence of databases online. On this deep Web, many sources are structured by providing structured query interfaces and results. Organizing such structured sources into a domain hierarchy is one of the critical steps toward the integration of heterogeneous Web sources. We observe that, for structured Web sources, query schemas (i.e., attributes in query interfaces) are discriminative representatives of the sources and thus can be exploited for source characterization. In particular, by viewing query schemas as a type of cal data, we abstract the problem of source organization into the clustering of categorical data. Our approach hypothesizes that “homogeneous sources” are characterized by the same hidden generative models for their schemas. To find clusters governed by such statistical distributions, we propose a new objective function, model-differentiation, which employs principled hypothesis testing to ma...
Bin He, Tao Tao, Kevin Chen-Chuan Chang
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CIKM
Authors Bin He, Tao Tao, Kevin Chen-Chuan Chang
Comments (0)