Sciweavers

115 search results - page 10 / 23
» Fusion of Loops for Parallelism and Locality
Sort
View
98
Voted
ICPADS
2006
IEEE
15 years 5 months ago
SPM Conscious Loop Scheduling for Embedded Chip Multiprocessors
One of the major factors that can potentially slow down widespread use of embedded chip multiprocessors is lack of efficient software support. In particular, automated code paral...
Liping Xue, Mahmut T. Kandemir, Guangyu Chen, Tayl...
DAC
2005
ACM
16 years 19 days ago
Locality-conscious workload assignment for array-based computations in MPSOC architectures
While the past research discussed several advantages of multiprocessor-system-on-a-chip (MPSOC) architectures from both area utilization and design verification perspectives over ...
Feihui Li, Mahmut T. Kandemir
124
Voted
ICS
1999
Tsinghua U.
15 years 4 months ago
An experimental evaluation of tiling and shackling for memory hierarchy management
On modern computers, the performance of programs is often limited by memory latency rather than by processor cycle time. To reduce the impact of memory latency, the restructuring ...
Induprakas Kodukula, Keshav Pingali, Robert Cox, D...
100
Voted
ASPLOS
1992
ACM
15 years 3 months ago
Access Normalization: Loop Restructuring for NUMA Compilers
: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must...
Wei Li, Keshav Pingali
ICPPW
2002
IEEE
15 years 4 months ago
Near-Optimal Loop Tiling by Means of Cache Miss Equations and Genetic Algorithms
The effectiveness of the memory hierarchy is critical for the performance of current processors. The performance of the memory hierarchy can be improved by means of program transf...
Jaume Abella, Antonio González, Josep Llosa...