Sciweavers

115 search results - page 16 / 23
» Fusion of Loops for Parallelism and Locality
Sort
View
IEEEPACT
1999
IEEE
15 years 4 months ago
On Reducing False Sharing while Improving Locality on Shared Memory Multiprocessors
The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coh...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
PLDI
1995
ACM
15 years 3 months ago
Improving Balanced Scheduling with Compiler Optimizations that Increase Instruction-Level Parallelism
Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory lat...
Jack L. Lo, Susan J. Eggers
83
Voted
LCPC
1997
Springer
15 years 3 months ago
Reducing Synchronization Overhead for Compiler-Parallelized Codes
Software distributed-shared-memory (DSM) systems providean appealingtarget for parallelizing compilers due to their flexibility. Previous studies demonstrate such systems can prov...
Hwansoo Han, Chau-Wen Tseng, Peter J. Keleher
ICS
2005
Tsinghua U.
15 years 5 months ago
Think globally, search locally
A key step in program optimization is the determination of optimal values for code optimization parameters such as cache tile sizes and loop unrolling factors. One approach, which...
Kamen Yotov, Keshav Pingali, Paul Stodghill
ICPP
1997
IEEE
15 years 3 months ago
Automatic Partitioning of Data and Computations on Scalable Shared Memory Multiprocessors
Abstract—This paper describes an algorithm for deriving data and computation partitions on scalable shared memory multiprocessors. The algorithm establishes affinity relationshi...
Sudarsan Tandri, Tarek S. Abdelrahman