Two problems from the recently published “NAS Parallel Benchmarks” have been implemented on three advanced parallel computer systems. These two benchmarks are the following: (...
Chip multiprocessors designed for streaming applications such as Cell BE offer impressive peak performance but suffer from limited bandwidth to offchip main memory. As the number o...
The paper presents the design and development of an online remote trace measurement and analysis system. The work combines the strengths of the TAU performance system with that of ...
Holger Brunst, Allen D. Malony, Sameer Shende, Rob...
In this paper we propose a new approach for scheduling data parallel applications on the Grid using irregular array distributions. We implement the scheduler as a new case study fo...
Abstract. The standard serial algorithm for strongly connected components is based on depth rst search, which is di cult to parallelize. We describe a divide-and-conquer algorithm ...