We introduce perturbation kernels, a new class of similarity measure for information retrieval that casts word similarity in terms of multi-task learning. Perturbation kernels mode...
In this paper, we describe a document clustering method called noveltybased document clustering. This method clusters documents based on similarity and novelty. The method assigns...
A natural consequence of the widespread adoption of XML as standard for information representation and exchange is the redundant storage of large amounts of persistent XML documen...
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host ...
Data collections often have inconsistencies that arise due to a variety of reasons, and it is desirable to be able to identify and resolve them efficiently. Set similarity queries ...
Marios Hadjieleftheriou, Amit Chandel, Nick Koudas...