Sciweavers

148 search results - page 2 / 30
» HaLoop: Efficient Iterative Data Processing on Large Cluster...
Sort
View
SIGMOD
2007
ACM
190views Database» more  SIGMOD 2007»
14 years 5 months ago
Map-reduce-merge: simplified relational data processing on large clusters
Map-Reduce is a programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. Through ...
Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, Dougl...
DMKD
1997
ACM
308views Data Mining» more  DMKD 1997»
13 years 9 months ago
A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining
Partitioning a large set of objects into homogeneous clusters is a fundamental operation in data mining. The k-means algorithm is best suited for implementing this operation becau...
Zhexue Huang
BMCBI
2010
139views more  BMCBI 2010»
13 years 5 months ago
A highly efficient multi-core algorithm for clustering extremely large datasets
Background: In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput t...
Johann M. Kraus, Hans A. Kestler
PVLDB
2008
182views more  PVLDB 2008»
13 years 4 months ago
SCOPE: easy and efficient parallel processing of massive data sets
Companies providing cloud-scale services have an increasing need to store and analyze massive data sets such as search logs and click streams. For cost and performance reasons, pr...
Ronnie Chaiken, Bob Jenkins, Per-Åke Larson,...
ICASSP
2010
IEEE
13 years 5 months ago
Swift: Scalable weighted iterative sampling for flow cytometry clustering
Flow cytometry (FC) is a powerful technology for rapid multivariate analysis and functional discrimination of cells. Current FC platforms generate large, high-dimensional datasets...
Iftekhar Naim, Suprakash Datta, Gaurav Sharma, Jam...