We consider the problem of partitioning, in a highly accurate and highly efficient way, a set of n documents lying in a metric space into k non-overlapping clusters. We augment th...
Filippo Geraci, Marco Pellegrini, Paolo Pisati, Fa...
Streaming data analysis has recently attracted attention in numerous applications including telephone records, web documents and clickstreams. For such analysis, single-pass algor...
Liadan O'Callaghan, Adam Meyerson, Rajeev Motwani,...
Due to the enormous size of the web and low precision of user queries, finding the right information from the web can be difficult if not impossible. One approach that tries to ...
This paper describes Armil, a meta-search engine that groups the web snippets returned by auxiliary search engines into disjoint labeled clusters. The cluster labels generated by A...
Filippo Geraci, Marco Pellegrini, Marco Maggini, F...
Users of Web search engines are often forced to sift through the long ordered list of document “snippets” returned by the engines. The IR community has explored document cluste...