145
click to vote
CF
15 years 4 months ago
2006 ACM
The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...
144
click to vote
CF
15 years 8 months ago
2006 ACM
In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distin...
110
click to vote
CF
15 years 8 months ago
2006 ACM
Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applicati RUMMA) [1] to handle irregularly distributed mat...
126
click to vote
CF
15 years 5 months ago
2006 ACM
This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates...
145
Voted
CF
15 years 5 months ago
2006 ACM
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
|