Sciweavers

938 search results - page 22 / 188
» Space-Efficient Algorithms for Document Retrieval
Sort
View
EDBT
2006
ACM
112views Database» more  EDBT 2006»
15 years 12 months ago
Indexing Shared Content in Information Retrieval Systems
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separa...
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Mi...
ICPR
2010
IEEE
15 years 1 months ago
Toward Massive Scalability in Image Matching
A method for image matching from partial blurry images is presented that leverages existing text retrieval algorithms to provide a solution that scales to hundreds of thousands of...
Jorge Moraleda, Jonathan Hull
98
Voted
SAC
2008
ACM
14 years 11 months ago
XEdge: clustering homogeneous and heterogeneous XML documents using edge summaries
In this paper we propose a unified clustering algorithm for both homogeneous and heterogeneous XML documents. Depending on the type of the XML documents, the proposed algorithm mo...
Panagiotis Antonellis, Christos Makris, Nikos Tsir...
ITCC
2003
IEEE
15 years 5 months ago
A Method for Calculating Term Similarity on Large Document Collections
We present an efficient algorithm called the Quadtree Heuristic for identifying a list of similar terms for each unique term in a large document collection. Term similarity is de...
Wolfgang W. Bein, Jeffrey S. Coombs, Kazem Taghva
98
Voted
SIGIR
2011
ACM
14 years 2 months ago
When documents are very long, BM25 fails!
We reveal that the Okapi BM25 retrieval function tends to overly penalize very long documents. To address this problem, we present a simple yet effective extension of BM25, namel...
Yuanhua Lv, ChengXiang Zhai