Sciweavers

365 search results - page 67 / 73
» Automatic Performance Debugging of SPMD Parallel Programs
Sort
View
ASPLOS
2009
ACM
15 years 10 months ago
Dynamic prediction of collection yield for managed runtimes
The growth in complexity of modern systems makes it increasingly difficult to extract high-performance. The software stacks for such systems typically consist of multiple layers a...
Michal Wegiel, Chandra Krintz
IEEEPACT
2008
IEEE
15 years 4 months ago
Exploiting loop-dependent stream reuse for stream processors
The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory acc...
Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers,...
ICPP
2003
IEEE
15 years 3 months ago
Procedural Level Address Offset Assignment of DSP Applications with Loops
Automatic optimization of address offset assignment for DSP applications, which reduces the number of address arithmetic instructions to meet the tight memory size restrictions an...
Youtao Zhang, Jun Yang 0002
IPPS
2000
IEEE
15 years 2 months ago
Augmenting Modern Superscalar Architectures with Configurable Extended Instructions
The instruction sets of general-purpose microprocessors are designed to offer good performance across a wide range of programs. The size and complexity of the instruction sets, how...
Xianfeng Zhou, Margaret Martonosi
ASPLOS
2008
ACM
14 years 11 months ago
Communication optimizations for global multi-threaded instruction scheduling
The recent shift in the industry towards chip multiprocessor (CMP) designs has brought the need for multi-threaded applications to mainstream computing. As observed in several lim...
Guilherme Ottoni, David I. August