Sciweavers

PPOPP
2012
ACM
12 years 3 days ago
CPHASH: a cache-partitioned hash table
CPHASH is a concurrent hash table for multicore processors. CPHASH partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partit...
Zviad Metreveli, Nickolai Zeldovich, M. Frans Kaas...
CAL
2010
13 years 1 months ago
SMT-Directory: Efficient Load-Load Ordering for SMT
Memory models like SC, TSO, and PC enforce load-load ordering, requiring that loads from any single thread appear to occur in program order to all other threads. Out-of-order execu...
A. Hilton, A. Roth
PARCO
2003
13 years 5 months ago
Cache Memory Behavior of Advanced PDE Solvers
Three different partial differential equation (PDE) solver kernels are analyzed in respect to cache memory performance on a simulated shared memory computer. The kernels implement...
Dan Wallin, Henrik Johansson, Sverker Holmgren
ISCA
1995
IEEE
109views Hardware» more  ISCA 1995»
13 years 8 months ago
Next Cache Line and Set Prediction
Accurate instruction fetch and branch prediction is increasingly important on today’s wide-issue architectures. Fetch prediction is the process of determining the next instructi...
Brad Calder, Dirk Grunwald
ISCA
1998
IEEE
136views Hardware» more  ISCA 1998»
13 years 8 months ago
Exploiting Spatial Locality in Data Caches Using Spatial Footprints
Modern cache designs exploit spatial locality by fetching large blocks of data called cache lines on a cache miss. Subsequent references to words within the same cache line result...
Sanjeev Kumar, Christopher B. Wilkerson
MICRO
2002
IEEE
128views Hardware» more  MICRO 2002»
13 years 9 months ago
Compiler-directed instruction cache leakage optimization
Excessive power consumption is widely considered as a major impediment to designing future microprocessors. With the continued scaling down of threshold voltages, the power consum...
Wei Zhang 0002, Jie S. Hu, Vijay Degalahal, Mahmut...
ISCA
2010
IEEE
305views Hardware» more  ISCA 2010»
13 years 9 months ago
Rethinking DRAM design and organization for energy-constrained multi-cores
DRAM vendors have traditionally optimized the cost-perbit metric, often making design decisions that incur energy penalties. A prime example is the overfetch feature in DRAM, wher...
Aniruddha N. Udipi, Naveen Muralimanohar, Niladris...
CASES
2006
ACM
13 years 10 months ago
Adaptive object code compression
Previous object code compression schemes have employed static and semiadaptive compression algorithms to reduce the size of instruction memory in embedded systems. The suggestion ...
John Gilbert, David M. Abrahamson
IPPS
2006
IEEE
13 years 10 months ago
SAMIE-LSQ: set-associative multiple-instruction entry load/store queue
The load/store queue (LSQ) is one of the most complex parts of contemporary processors. Its latency is critical for the processor performance and it is usually one of the processo...
Jaume Abella, Antonio González
DELTA
2008
IEEE
13 years 11 months ago
Improved Policies for Drowsy Caches in Embedded Processors
In the design of embedded systems, especially batterypowered systems, it is important to reduce energy consumption. Cache are now used not only in general-purpose processors but a...
Junpei Zushi, Gang Zeng, Hiroyuki Tomiyama, Hiroak...