We present a document routing and index partitioning scheme for scalable similarity-based search of documents in a large corpus. We consider the case when similarity-based search ...
A good distance metric is crucial for many data mining tasks. To learn a metric in the unsupervised setting, most metric learning algorithms project observed data to a lowdimensio...
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the co-occu...
Eino Hinkkanen, Hannes Heikinheimo, Heikki Mannila...
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
In this paper we address the problem of detecting topics in large-scale linked document collections. Recently, topic detection has become a very active area of research due to its...