Sciweavers

7 search results - page 1 / 2
» A Parallel for Loop Memory Template for a High Level Synthes...
Sort
View
DSD
2010
IEEE
162views Hardware» more  DSD 2010»
13 years 3 months ago
A Parallel for Loop Memory Template for a High Level Synthesis Compiler
—We propose a parametrized memory template for applications with parallel for loops. The template’s parameters reflect important trade-offs made during system design. The temp...
Craig Moore, Wim Meeus, Harald Devos, Dirk Strooba...
IPPS
1998
IEEE
13 years 8 months ago
Compiler-Optimization of Implicit Reductions for Distributed Memory Multiprocessors
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Bo Lu, John M. Mellor-Crummey
ISPDC
2010
IEEE
13 years 3 months ago
Resource-Aware Compiler Prefetching for Many-Cores
—Super-scalar, out-of-order processors that can have tens of read and write requests in the execution window place significant demands on Memory Level Parallelism (MLP). Multi- ...
George C. Caragea, Alexandros Tzannes, Fuat Keceli...
ICCS
2005
Springer
13 years 10 months ago
Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization
Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
Sadaf R. Alam, Jeffrey S. Vetter
HPCC
2009
Springer
13 years 9 months ago
On Instruction-Level Method for Reducing Cache Penalties in Embedded VLIW Processors
Usual cache optimisation techniques for high performance computing are difficult to apply in embedded VLIW applications. First, embedded applications are not always well structur...
Samir Ammenouche, Sid Ahmed Ali Touati, William Ja...