—This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying ...
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
Characterizing shared-memory applications provides insight to design efficient systems, and provides awareness to identify and correct application performance bottlenecks. Configu...
Tools for parallel systems today range from specification over debugging to performance analysis and more. Typically, they help the programmers of parallel algorithms from the ea...
Pipelining has been used in the design of many PRAM algorithms to reduce their asymptotic running time. Paul, Vishkin, and Wagener (PVW) used the approach in a parallel implementat...