The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coh...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory lat...
Software distributed-shared-memory (DSM) systems providean appealingtarget for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can prov...
A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which...
Abstract—This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationshi...