Sciweavers

16 search results - page 3 / 4
» Parallel-stage decoupled software pipelining
Sort
View
83
Voted
HPCA
2011
IEEE
14 years 1 months ago
HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor
Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained paralle...
Sanghoon Lee, Devesh Tiwari, Yan Solihin, James Tu...
ESA
2004
Springer
166views Algorithms» more  ESA 2004»
15 years 2 months ago
Super Scalar Sample Sort
Sample sort, a generalization of quicksort that partitions the input into many pieces, is known as the best practical comparison based sorting algorithm for distributed memory para...
Peter Sanders, Sebastian Winkel
CGO
2003
IEEE
15 years 2 months ago
Optimizing Memory Accesses For Spatial Computation
In this paper we present the internal representation and optimizations used by the CASH compiler for improving the memory parallelism of pointer-based programs. CASH uses an SSA-b...
Mihai Budiu, Seth Copen Goldstein
MICRO
2007
IEEE
168views Hardware» more  MICRO 2007»
15 years 3 months ago
Global Multi-Threaded Instruction Scheduling
Recently, the microprocessor industry has moved toward chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro...
Guilherme Ottoni, David I. August
ISCA
1995
IEEE
120views Hardware» more  ISCA 1995»
15 years 29 days ago
Streamlining Data Cache Access with Fast Address Calculation
For many programs, especially integer codes, untolerated load instruction latencies account for a significant portion of total execution time. In this paper, we present the desig...
Todd M. Austin, Dionisios N. Pnevmatikatos, Gurind...