Traditional approaches for content-based image querying typically compute a single signature for each image based on color histograms, texture, wavelet transforms etc., and return...
Identification of all objects in a dataset whose similarity is not less than a specified threshold is of major importance for management, search, and analysis of data. Set similari...
Determining similarities among multimedia objects is a fundamental task in many content-based retrieval, analysis, mining, and exploration applications. Among state-of-the-art sim...
Similarity retrieval mechanisms should utilize generalized quadratic form distance functions as well as the Euclidean distance function since ellipsoid queries parameters may vary...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz