When data resides on tertiary storage, clustering is the key to achieving high retrieval performance. However, a straightforward approach to clustering massive amounts of data on ...
Representative-based clustering algorithms are quite popular due to their relative high speed and because of their sound theoretical foundation. On the other hand, the clusters the...
Abstract. We describe a clustering approach with the emphasis on detecting coherent structures in a complex dataset, and illustrate its effectiveness with computer vision applicat...
MapReduce is emerging as an important programming model for large-scale data-parallel applications such as web indexing, data mining, and scientific simulation. Hadoop is an open-...
Matei Zaharia, Andy Konwinski, Anthony D. Joseph, ...
Abstract. In this paper, we present the ICSI speaker diarization system. This system was used in the 2007 National Institute of Standards and Technology (NIST) Rich Transcription e...