Sciweavers

45 search results - page 3 / 9
» Automatic Partitioning of Data and Computations on Scalable ...
Sort
View
CF
2006
ACM
13 years 9 months ago
An efficient cache design for scalable glueless shared-memory multiprocessors
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
Alberto Ros, Manuel E. Acacio, José M. Garc...
ICCS
2004
Springer
13 years 11 months ago
Improving Geographical Locality of Data for Shared Memory Implementations of PDE Solvers
On cc-NUMA multi-processors, the non-uniformity of main memory latencies motivates the need for co-location of threads and data. We call this special form of data locality, geogra...
Henrik Löf, Markus Nordén, Sverker Hol...
CODES
2007
IEEE
13 years 12 months ago
Simultaneous synthesis of buses, data mapping and memory allocation for MPSoC
Heterogeneous multiprocessors are emerging as the dominant implementation approach to embedded multiprocessor systems. In addition to having processing elements suited to the targ...
Brett H. Meyer, Donald E. Thomas
ICPP
1998
IEEE
13 years 9 months ago
A memory-layout oriented run-time technique for locality optimization
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
Yong Yan, Xiaodong Zhang, Zhao Zhang
PPOPP
2009
ACM
14 years 6 months ago
A compiler-directed data prefetching scheme for chip multiprocessors
Data prefetching has been widely used in the past as a technique for hiding memory access latencies. However, data prefetching in multi-threaded applications running on chip multi...
Dhruva Chakrabarti, Mahmut T. Kandemir, Mustafa Ka...