Parallel programming continues to be difficult, despite substantial and ongoing research aimed at making it tractable. Especially dismaying is the gulf between theory and the pract...
Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architect...
Stamatis G. Kavadias, Manolis Katevenis, Michail Z...
As parallel jobs get bigger in size and finer in granularity, “system noise” is increasingly becoming a problem. In fact, fine-grained jobs on clusters with thousands of SMP...
Dan Tsafrir, Yoav Etsion, Dror G. Feitelson, Scott...
In this paper we describe a cellular automaton (CA) used to perform the watershed transform in N-D images. Our method is based on image integration via the Ford-Bellman shortest p...
This paper presents an overview of our work, concerning a complete end-to-end framework for automatically generating message passing parallel code for tiled nested for-loops. It c...
Georgios I. Goumas, Nikolaos Drosinos, Maria Athan...