Most of today's multiprocessors have a DistributedShared Memory (DSM) organization, which enables scalability while retaining the convenience of the shared-memory programming...
Angeles G. Navarro, Rafael Asenjo, Emilio L. Zapat...
The polyhedral model provides powerful abstractions to optimize loop nests with regular accesses. Affine transformations in this model capture a complex sequence of execution-reord...
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
With the increasing gap between processor speed and memory latency, the performance of data-dominated programs are becoming more reliant on fast data access, which can be improved...
Load balancing and data locality are the two most important factors in the performance of parallel programs on distributed-memory multiprocessors. A good balancing scheme should e...