Similarity search and similarity join on strings are important for applications such as duplicate detection, error detection, data cleansing, or comparison of biological sequences....
Search engine click logs provide an invaluable source of relevance information but this information is biased because we ignore which documents from the result list the users have...
We propose and evaluate two indexing schemes for improving the efficiency of data retrieval in high-dimensional databases that are incomplete. These schemes are novel in that the ...
This paper proposes a general framework for searching large distributed repositories. Examples of such repositories include sites with music/video content, distributed digital lib...
Abstract. Most Information Retrieval models take documents as Bagof-Words and are thereby bound to the language of the documents. In this paper, we present an approach using Linked...