Sciweavers

20 search results - page 4 / 4
» Optimizing the memory bandwidth with loop fusion
Sort
View
ICCS
2005
Springer
13 years 10 months ago
Performance and Scalability Analysis of Cray X1 Vectorization and Multistreaming Optimization
Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
Sadaf R. Alam, Jeffrey S. Vetter
PC
2010
190views Management» more  PC 2010»
13 years 3 months ago
High-performance cone beam reconstruction using CUDA compatible GPUs
Compute unified device architecture (CUDA) is a software development platform that allows us to run C-like programs on the nVIDIA graphics processing unit (GPU). This paper prese...
Yusuke Okitsu, Fumihiko Ino, Kenichi Hagihara
ISCA
2005
IEEE
115views Hardware» more  ISCA 2005»
13 years 10 months ago
RENO - A Rename-Based Instruction Optimizer
RENO is a modified MIPS R10000 register renamer that uses map-table “short-circuiting” to implement dynamic versions of several well-known static optimizations: move eliminat...
Vlad Petric, Tingting Sha, Amir Roth
LCPC
2009
Springer
13 years 9 months ago
A Balanced Approach to Application Performance Tuning
Abstract. Current hardware trends place increasing pressure on programmers and tools to optimize scientific code. Numerous tools and techniques exist, but no single tool is a pana...
Souad Koliai, Stéphane Zuckerman, Emmanuel ...
IPPS
2009
IEEE
13 years 11 months ago
High-order stencil computations on multicore clusters
Stencil computation (SC) is of critical importance for broad scientific and engineering applications. However, it is a challenge to optimize complex, highorder SC on emerging clus...
Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv ...