Sciweavers

HIPC
2007
Springer
13 years 11 months ago
Direct Coherence: Bringing Together Performance and Scalability in Shared-Memory Multiprocessors
Traditional directory-based cache coherence protocols suffer from long-latency cache misses as a consequence of the indirection introduced by the home node, which must be accessed...
Alberto Ros, Manuel E. Acacio, José M. Garc...
IPPS
2007
IEEE
13 years 11 months ago
Performance Analysis of a Family of WHT Algorithms
This paper explores the correlation of instruction counts and cache misses to runtime performance for a large family of divide and conquer algorithms to compute the Walsh–Hadama...
Michael Andrews, Jeremy Johnson
ICSEA
2007
IEEE
13 years 11 months ago
A Model for the Effect of Caching on Algorithmic Efficiency in Radix based Sorting
— This paper demonstrates that the algorithmic performance of end user programs may be greatly affected by the two or three level caching scheme of the processor, and we introduc...
Arne Maus, Stein Gjessing
DATE
2008
IEEE
171views Hardware» more  DATE 2008»
13 years 11 months ago
Cache Aware Mapping of Streaming Applications on a Multiprocessor System-on-Chip
Efficient use of the memory hierarchy is critical for achieving high performance in a multiprocessor systemon-chip. An external memory that is shared between processors is a bottl...
Arno Moonen, Marco Bekooij, Rene van den Berg, Jef...
DATE
2008
IEEE
165views Hardware» more  DATE 2008»
13 years 11 months ago
Dynamic Round-Robin Task Scheduling to Reduce Cache Misses for Embedded Systems
Modern embedded CPU systems rely on a growing number of software features, but this growth increases the memory footprint and increases the need for efficient instruction and data...
Ken W. Batcher, Robert A. Walker
HPCA
2005
IEEE
14 years 5 months ago
Using Virtual Load/Store Queues (VLSQs) to Reduce the Negative Effects of Reordered Memory Instructions
The use of large instruction windows coupled with aggressive out-oforder and prefetching capabilities has provided significant improvements in processor performance. In this paper...
Aamer Jaleel, Bruce L. Jacob
DAC
2008
ACM
14 years 6 months ago
Latency and bandwidth efficient communication through system customization for embedded multiprocessors
We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, o...
Chenjie Yu, Peter Petrov