Sciweavers

CF
2006
ACM
14 years 11 months ago
Intermediately executed code is the key to find refactorings that improve temporal data locality
The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...
Kristof Beyls, Erik H. D'Hollander
CF
2006
ACM
15 years 2 months ago
Dynamic thread assignment on heterogeneous multiprocessor architectures
In a multi-programmed computing environment, threads of execution exhibit different runtime characteristics and hardware resource requirements. Not only do the behaviors of distin...
Michela Becchi, Patrick Crowley
CF
2006
ACM
15 years 2 months ago
Memory efficient parallel matrix multiplication operation for irregular problems
Regular distributions for storing dense matrices on parallel systems are not always used in practice. In many scientific applicati RUMMA) [1] to handle irregularly distributed mat...
Manojkumar Krishnan, Jarek Nieplocha
CF
2006
ACM
15 years 21 days ago
Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip
This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates...
Juan del Cuvillo, Weirong Zhu, Guang R. Gao
CF
2006
ACM
15 years 21 days ago
An efficient cache design for scalable glueless shared-memory multiprocessors
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
Alberto Ros, Manuel E. Acacio, José M. Garc...
Applied Computing
Top of PageReset Settings