Sciweavers

IM
2007

Cluster Generation and Labeling for Web Snippets: A Fast, Accurate Hierarchical Solution

13 years 9 months ago
Cluster Generation and Labeling for Web Snippets: A Fast, Accurate Hierarchical Solution
This paper describes Armil, a meta-search engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by Armil provide the user with a compact guide to assessing the relevance of each cluster to his/her information need. Striking the right balance between running time and cluster well-formedness was a key point in the design of our system. Both the clustering and the labeling tasks are performed on the fly by processing only the snippets provided by the auxiliary search engines, and they use no external sources of knowledge. Clustering is performed by means of a fast version of the furthest-pointfirst algorithm for metric k-center clustering. Cluster labeling is achieved by combining intra-cluster and inter-cluster term extraction based on a variant of the information gain measure. We have tested the clustering effectiveness of Armil against Vivisimo, the de facto industrial standard in web snippet clustering,...
Filippo Geraci, Marco Pellegrini, Marco Maggini, F
Added 19 Dec 2010
Updated 19 Dec 2010
Type Journal
Year 2007
Where IM
Authors Filippo Geraci, Marco Pellegrini, Marco Maggini, Fabrizio Sebastiani
Comments (0)