Abstract. The similarity join has become an important database primitive to support similarity search and data mining. A similarity join combines two sets of complex objects such t...
Probabilistic data is coming as a new deluge along with the technical advances on geographical tracking, multimedia processing, sensor network and RFID. While similarity search is...
In this paper we consider distributed K-Nearest Neighbor (KNN) search and range query processing in high dimensional data. Our approach is based on Locality Sensitive Hashing (LSH...
This paper explores the problem of computing pairwise similarity on document collections, focusing on the application of “more like this” queries in the life sciences domain. ...
Recent work in supervised learning of term-based retrieval models has shown significantly improved accuracy can often be achieved via better model estimation [2, 10, 11, 17]. In ...