K-Means clustering is widely used in information retrieval and data mining. Distributed K-Means variants have already been proposed, but none of the past algorithms scales to large...
Odysseas Papapetrou, Wolf Siberski, Fabian Leitrit...
We introduce a robust and efficient framework called CLUMP (CLustering Using Multiple Prototypes) for unsupervised discovery of structure in data. CLUMP relies on finding multip...
DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments su...
Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Bud...
The vision of grid computing is to make computational power, storage capacity, data and applications available to users as readily as electricity and other utilities. Grid infrast...
Navonil Mustafee, Anders Alstad, Bjorn Larsen, Sim...
—“Big Data” in map-reduce (M-R) clusters is often fundamentally temporal in nature, as are many analytics tasks over such data. For instance, display advertising uses Behavio...
Badrish Chandramouli, Jonathan Goldstein, Songyun ...