Sciweavers

14 search results - page 3 / 3
» Reducing Memory Latency via Read-after-Read Memory Dependenc...
Sort
View
ICS
2000
Tsinghua U.
13 years 9 months ago
Push vs. pull: data movement for linked data structures
As the performance gap between the CPU and main memory continues to grow, techniques to hide memory latency are essential to deliver a high performance computer system. Prefetchin...
Chia-Lin Yang, Alvin R. Lebeck
IEEEPACT
1998
IEEE
13 years 9 months ago
Adaptive Scheduling of Computations and Communications on Distributed Memory Systems
Compile-time scheduling is one approach to extract parallelism which has proved effective when the execution behavior is predictable. Unfortunately, the performance of most priori...
Mayez A. Al-Mouhamed, Homam Najjari
ISCA
2000
IEEE
111views Hardware» more  ISCA 2000»
13 years 9 months ago
Understanding the backward slices of performance degrading instructions
For many applications, branch mispredictions and cache misses limit a processor’s performance to a level well below its peak instruction throughput. A small fraction of static i...
Craig B. Zilles, Gurindar S. Sohi
MICRO
2002
IEEE
131views Hardware» more  MICRO 2002»
13 years 10 months ago
Pointer cache assisted prefetching
Data prefetching effectively reduces the negative effects of long load latencies on the performance of modern processors. Hardware prefetchers employ hardware structures to predic...
Jamison D. Collins, Suleyman Sair, Brad Calder, De...