CLUS: Parallel Subspace Clustering Algorithm on Spark

9 years 12 months ago

Download ict-ontic.eu

Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorithms is at most exponential with respect to the dimensionality of the dataset. In addition, datasets are generally too large to ﬁt in a single machine under the current big data scenarios. The extremely high computational complexity, which results in poor scalability with respect to both size and dimensionality of these datasets, give us strong motivations to propose a parallelized subspace clustering algorithm able to handle large high dimensional data. To the best of our knowledge, there are no other parallel subspace clustering algorithms that run on top of new generation big data distributed platforms such as MapReduce and Spark. In this paper we introduce CLUS: a novel parallel solution of subspace clustering based on SUBCLU algorithm. CLUS uses a new dynamic data partitioning method speciﬁcally desig...

Bo Zhu, Alexandru Mara, Alberto Mozo

Real-time Traffic

ADBIS 2015 | Database |

claim paper

» A Scalable Parallel Subspace Clustering Algorithm for Massive Data Sets

» Parallel NewtonKrylovSchwarz algorithms for the threedimensional PoissonBoltzmann equation...

Post Info
More Details (n/a)

Added	13 Apr 2016
Updated	13 Apr 2016
Type	Journal
Year	2015
Where	ADBIS
Authors	Bo Zhu, Alexandru Mara, Alberto Mozo

Comments (0)

Sciweavers

CLUS: Parallel Subspace Clustering Algorithm on Spark

ADBIS 2015 | Database |

Explore & Download

Productivity Tools

Sciweavers