Sciweavers

16 search results - page 3 / 4
» Parallel-stage decoupled software pipelining
Sort
View
HPCA
2011
IEEE
12 years 9 months ago
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor
Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained paralle...
Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tu...
ESA
2004
Springer
166views Algorithms» more  ESA 2004»
13 years 10 months ago
Super Scalar Sample Sort
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory para...
Peter Sanders, Sebastian Winkel
CGO
2003
IEEE
13 years 10 months ago
Optimizing Memory Accesses For Spatial Computation
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-b...
Mihai Budiu, Seth Copen Goldstein
MICRO
2007
IEEE
168views Hardware» more  MICRO 2007»
13 years 11 months ago
Global Multi-Threaded Instruction Scheduling
Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro...
Guilherme Ottoni, David I. August
ISCA
1995
IEEE
120views Hardware» more  ISCA 1995»
13 years 8 months ago
Streamlining Data Cache Access with Fast Address Calculation
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...