Sciweavers

41 search results - page 3 / 9
» Exploiting program cyclic behavior to reduce memory latency ...
Sort
View
HPCA
2006
IEEE
14 years 5 months ago
A decoupled KILO-instruction processor
Building processors with large instruction windows has been proposed as a mechanism for overcoming the memory wall, but finding a feasible and implementable design has been an elu...
Miquel Pericàs, Adrián Cristal, Rube...
CASES
2003
ACM
13 years 10 months ago
Exploiting bank locality in multi-bank memories
Bank locality can be defined as localizing the number of load/store accesses to a small set of memory banks at a given time. An optimizing compiler can modify a given input code t...
Guilin Chen, Mahmut T. Kandemir, Hendra Saputra, M...
ECRTS
2007
IEEE
13 years 11 months ago
WCET-Directed Dynamic Scratchpad Memory Allocation of Data
Many embedded systems feature processors coupled with a small and fast scratchpad memory. To the difference with caches, allocation of data to scratchpad memory must be handled by...
Jean-François Deverge, Isabelle Puaut
EUROPAR
2009
Springer
13 years 8 months ago
Fast and Efficient Synchronization and Communication Collective Primitives for Dual Cell-Based Blades
The Cell Broadband Engine (Cell BE) is a heterogeneous multi-core processor specifically designed to exploit thread-level parallelism. Its memory model comprehends a common shared ...
Epifanio Gaona, Juan Fernández, Manuel E. A...
ASPDAC
2004
ACM
107views Hardware» more  ASPDAC 2004»
13 years 10 months ago
Fast, predictable and low energy memory references through architecture-aware compilation
The design of future high-performance embedded systems is hampered by two problems: First, the required hardware needs more energy than is available from batteries. Second, curren...
Peter Marwedel, Lars Wehmeyer, Manish Verma, Stefa...