Overlapping communication with computation is an important optimization on current cluster architectures; its importance is likely to increase as the doubling of processing power ...
Wei-Yu Chen, Dan Bonachea, Costin Iancu, Katherine...
Today’s one-pass analytics applications tend to be data-intensive in nature and require the ability to process high volumes of data efficiently. MapReduce is a popular programm...
Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGreg...
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
—The ever increasing scale and complexity of large computational systems ask for sophisticated management tools, paving the way toward Autonomic Computing. A first step toward A...
In this paper an optimized k-means implementation on the graphics processing unit (GPU) is presented. NVIDIA’s Compute Unified Device Architecture (CUDA), available from the G8...