Background: Clustering methods are widely used on gene expression data to categorize genes with similar expression profiles. Finding an appropriate (dis)similarity measure is crit...
Kyungpil Kim, Shibo Zhang, Keni Jiang, Li Cai, In-...
Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receiv...
The aim of data mining is to find novel and actionable insights in data. However, most algorithms typically just find a single (possibly non-novel/actionable) interpretation of th...
Privacy, preservation and performance (“3 P’s”) are central design objectives for distributed data management systems. However, these objectives tend to compete with one ano...
In recent years, several frameworks have been developed for processing very large quantities of data on large clusters of commodity PCs. These frameworks have focused on fault-tole...