Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to provide a solution that scales to hundreds of thousands of...
In this paper we propose a unified clustering algorithm for both homogeneous and heterogeneous XML documents. Depending on the type of the XML documents, the proposed algorithm mo...
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...