In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
The growth of the web has directly influenced the increase in the availability of relational data. One of the key problems in mining such data is computing the similarity between o...
Pradeep Muthukrishnan, Dragomir R. Radev, Qiaozhu ...
Many information integration tasks require computing similarity between pairs of objects. Pairwise similarity computations are particularly important in record linkage systems, as...
In this paper, we introduce a new approach to learn dissimilarity for interactive search in content based image retrieval. In literature, dissimilarity is often learned via the fe...
Giang P. Nguyen, Marcel Worring, Arnold W. M. Smeu...
Based on the correlation between expression and ontologydriven gene similarity, we incorporate functional annotations into gene expression clustering validation. A probabilistic f...