This paper analyzes the impact of hardware multithreading support on the performance of distributed shared-memory DSM multiprocessors built out of heterogeneous, single-chip compu...
Renato J. O. Figueiredo, Jeffrey P. Bradford, Jos&...
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongo...
James Dinan, D. Brian Larkins, P. Sadayappan, Srir...
Grid Workflows are emerging as practical programming models for solving large e-scientific problems on the Grid. However, it is typically assumed that the workflow components eith...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Efficient performance tuning of parallel programs is often hard. In this paper we describe an approach that uses a uni-processor execution of a multithreaded program as reference ...