Sciweavers

11 search results - page 2 / 3
» Optimizing the Memory Bandwidth with Loop Morphing
Sort
View
CODES
2000
IEEE
13 years 10 months ago
Co-design of interleaved memory systems
Memory interleaving is a cost-efficient approach to increase bandwidth. Improving data access locality and reducing memory access conflicts are two important aspects to achieve hi...
Hua Lin, Wayne Wolf
IEEEPACT
2006
IEEE
13 years 11 months ago
Compiling for stream processing
This paper describes a compiler for stream programs that efficiently schedules computational kernels and stream memory operations, and allocates on-chip storage. Our compiler uses...
Abhishek Das, William J. Dally, Peter R. Mattson
ICCS
2005
Springer
13 years 11 months ago
Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization
Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
Sadaf R. Alam, Jeffrey S. Vetter
PC
2010
190views Management» more  PC 2010»
13 years 3 months ago
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara
LCPC
2009
Springer
13 years 10 months ago
A Balanced Approach to Application Performance Tuning
Abstract. Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a pana...
Souad Koliai, Stéphane Zuckerman, Emmanuel ...