Document clustering is a powerful technique that has been widely used for organizing data into smaller and manageable information kernels. Several approaches have been proposed...
Clustering is a basic task in a variety of machine learning applications. Partitioning a set of input vectors into compact, wellseparated subsets can be severely affected by the p...
Pedro A. Forero, Vassilis Kekatos, Georgios B. Gia...
Background: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotat...
We propose to use MapReduce to quickly test new retrieval approaches on a cluster of machines by sequentially scanning all documents. We present a small case study in which we use ...
Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have re...