good data partitioning scheme is the need of the time. However it is very diflcult to arrive at a good solution as the number of possible dutupartitionsfor a given real lifeprogra...
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
Switch-based interconnects are used in a number of application domains including parallel system interconnects, local area networks, and wide area networks. However, very few swit...
Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. P...
Many programs exploit shared-memory parallelism using multithreading. Threaded codes typically use locks to coordinate access to shared data. In many cases, contention for locks r...
Nathan R. Tallent, John M. Mellor-Crummey, Allan P...
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...