BD-CATS: big data clustering at trillion particle scale

4 years 9 months ago
BD-CATS: big data clustering at trillion particle scale
Modern cosmology and plasma physics codes are now capable of simulating trillions of particles on petascale systems. Each timestep output from such simulations is on the order of 10s of TBs. Summarizing and analyzing raw particle data is challenging, and scientists often focus on density structures, whether in the real 3D space, or a high-dimensional phase space. In this work, we develop a highly scalable version of the clustering algorithm DBSCAN, and apply it to the largest datasets produced by state-of-the-art codes. Our system, called BD-CATS, is the first one capable of performing end-to-end analysis at trillion particle scale (including: loading the data, geometric partitioning, computing kd-trees, performing clus
Added 17 Apr 2016
Updated 17 Apr 2016
Type Journal
Year 2015
Where SC
Comments (0)