Abstract--The now commonplace multi-core chips have introduced, by design, a deep hierarchy of memory and cache banks within parallel computers as a tradeoff between the user frien...
Irregular and dynamic parallel applications pose significant challenges to achieving scalable performance on large-scale multicore clusters. These applications often require ongo...
James Dinan, D. Brian Larkins, P. Sadayappan, Srir...
Many parallel systems offer a simple view of memory: all storage cells are addresseduniformly. Despite a uniform view of the memory, the machines differsignificantly in theirmemo...
BSPlib is a small communications library for bulk synchronous parallel (BSP) programming which consists of only 20 basic operations. This paper presents the full de nition of BSPl...
Jonathan M. D. Hill, Bill McColl, Dan C. Stefanesc...
Much of high performance technical computing has moved from shared memory architectures to message based cluster systems. The development and wide adoption of the MPI parallel pro...