Sciweavers

661 search results - page 97 / 133
» Increasing Processor Performance by Implementing Deeper Pipe...
Sort
View
ISCA
2007
IEEE
110views Hardware» more  ISCA 2007»
15 years 6 months ago
Late-binding: enabling unordered load-store queues
Conventional load/store queues (LSQs) are an impediment to both power-efficient execution in superscalar processors and scaling to large-window designs. In this paper, we propose...
Simha Sethumadhavan, Franziska Roesner, Joel S. Em...
DAC
2008
ACM
16 years 27 days ago
Parallelizing CAD: a timely research agenda for EDA
The relative decline of single-threaded processor performance, coupled with the ongoing shift towards on chip parallelism requires that CAD applications run efficiently on paralle...
Bryan C. Catanzaro, Kurt Keutzer, Bor-Yiing Su
VIIP
2001
15 years 1 months ago
Using Graphics Cards for Quantized FEM Computations
Graphics cards exercise increasingly more computing power and are highly optimized for high data transfer volumes. In contrast typical workstations perform badly when data exceeds...
Martin Rumpf, Robert Strzodka
ICPP
1999
IEEE
15 years 4 months ago
Trace-Level Reuse
Trace-level reuse is based on the observation that some traces (dynamic sequences of instructions) are frequently repeated during the execution of a program, and in many cases, th...
Antonio González, Jordi Tubella, Carlos Mol...
IWMM
2010
Springer
137views Hardware» more  IWMM 2010»
15 years 3 months ago
The locality of concurrent write barriers
Concurrent and incremental collectors require barriers to ensure correct synchronisation between mutator and collector. The overheads imposed by particular barriers on particular ...
Laurence Hellyer, Richard Jones, Antony L. Hosking