Abstract. Clustering data described by categorical attributes is a challenging task in data mining applications. Unlike numerical attributes, it is difficult to define a distance b...
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to ge...
We present an extension of convex-hull non-negative matrix factorization (CH-NMF) which was recently proposed as a large scale variant of convex non-negative matrix factorization ...
Kristian Kersting, Mirwaes Wahabzada, Christian Th...
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. The power of data mining tools to extract hidden information that can...
This paper presents a comprehensive statistical analysis of workloads collected on data-intensive clusters and Grids. The analysis is conducted at different levels, including Virt...