Exploring the Similarity Space

8 years 11 months ago
Exploring the Similarity Space
Ranked queries are used to locate relevant documents in text databases. In a ranked query a list of terms is specified, then the documents that most closely match the query are returned—in decreasing order of similarity—as answers. Crucial to the efficacy of ranked querying is the use of a similarity heuristic, a mechanism that assigns a numeric score indicating how closely a document and the query match. In this note we explore and categorise a range of similarity heuristics described in the literature. We have implemented all of these measures in a structured way, and have carried out retrieval experiments with a substantial subset of these measures. Our purpose with this work is threefold: first, in enumerating the various measures in an orthogonal framework we make it straightforward for other researchers to describe and discuss similarity measures; second, by experimenting with a wide range of the measures, we hope to observe which features yield good retrieval behaviour in ...
Justin Zobel, Alistair Moffat
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 1998
Authors Justin Zobel, Alistair Moffat
Comments (0)