In many Web search applications, similarities between objects of one type (say, queries) can be affected by the similarities between their interrelated objects of another type (sa...
Recent work in deduplication has shown that collective deduplication of different attribute types can improve performance. But although these techniques cluster the attributes col...
Redescription mining is a newly introduced data mining problem that seeks to find subsets of data that afford multiple definitions. It can be viewed as a generalization of associa...
Abstract The notorious "dimensionality curse" is a wellknown phenomenon for any multi-dimensional indexes attempting to scale up to high dimensions. One well-known approa...
Microaggregation is one of the most employed microdata protection methods. The idea is to build clusters of at least k original records, and then replace them with the centroid of...