Sciweavers

500 search results - page 89 / 100
» Compiling SA-C Programs to FPGAs: Performance Results
Sort
View
PPOPP
2010
ACM
15 years 9 months ago
Using data structure knowledge for efficient lock generation and strong atomicity
To achieve high-performance on multicore systems, sharedmemory parallel languages must efficiently implement atomic operations. The commonly used and studied paradigms for atomici...
Gautam Upadhyaya, Samuel P. Midkiff, Vijay S. Pai
ISCA
1995
IEEE
92views Hardware» more  ISCA 1995»
15 years 3 months ago
A Comparison of Full and Partial Predicated Execution Support for ILP Processors
One can e ectively utilize predicated execution to improve branch handling in instruction-level parallel processors. Although the potential bene ts of predicated execution are hig...
Scott A. Mahlke, Richard E. Hank, James E. McCormi...
ICS
2007
Tsinghua U.
15 years 5 months ago
Optimization of data prefetch helper threads with path-expression based statistical modeling
This paper investigates helper threads that improve performance by prefetching data on behalf of an application’s main thread. The focus is data prefetch helper threads that lac...
Tor M. Aamodt, Paul Chow
IPPS
1999
IEEE
15 years 4 months ago
A Graph Based Framework to Detect Optimal Memory Layouts for Improving Data Locality
In order to extract high levels of performance from modern parallel architectures, the effective management of deep memory hierarchies is very important. While architectural advan...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...
DAC
2008
ACM
15 years 1 months ago
Application mapping for chip multiprocessors
The problem attacked in this paper is one of automatically mapping an application onto a Network-on-Chip (NoC) based chip multiprocessor (CMP) architecture in a locality-aware fas...
Guangyu Chen, Feihui Li, Seung Woo Son, Mahmut T. ...