Exploring the Similarity Space

14 years 11 months ago

Download goanna.cs.rmit.edu.au

Ranked queries are used to locate relevant documents in text databases. In a ranked query a list of terms is speciﬁed, then the documents that most closely match the query are returned—in decreasing order of similarity—as answers. Crucial to the eﬃcacy of ranked querying is the use of a similarity heuristic, a mechanism that assigns a numeric score indicating how closely a document and the query match. In this note we explore and categorise a range of similarity heuristics described in the literature. We have implemented all of these measures in a structured way, and have carried out retrieval experiments with a substantial subset of these measures. Our purpose with this work is threefold: ﬁrst, in enumerating the various measures in an orthogonal framework we make it straightforward for other researchers to describe and discuss similarity measures; second, by experimenting with a wide range of the measures, we hope to observe which features yield good retrieval behaviour in ...

Justin Zobel, Alistair Moffat

Real-time Traffic