Sciweavers

26 search results - page 5 / 6
» Loop Optimization using Hierarchical Compilation and Kernel ...
Sort
View
ISHPC
2003
Springer
13 years 10 months ago
Code and Data Transformations for Improving Shared Cache Performance on SMT Processors
Simultaneous multithreaded processors use shared on-chip caches, which yield better cost-performance ratios. Sharing a cache between simultaneously executing threads causes excessi...
Dimitrios S. Nikolopoulos
ICCAD
2008
IEEE
150views Hardware» more  ICCAD 2008»
14 years 2 months ago
Performance estimation and slack matching for pipelined asynchronous architectures with choice
— This paper presents a fast analytical method for estimating the throughput of pipelined asynchronous systems, and then applies that method to develop a fast solution to the pro...
Gennette Gill, Vishal Gupta, Montek Singh
CGO
2010
IEEE
14 years 6 days ago
Automatic creation of tile size selection models
Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is...
Tomofumi Yuki, Lakshminarayanan Renganarayanan, Sa...
IPPS
2009
IEEE
13 years 12 months ago
High-order stencil computations on multicore clusters
Stencil computation (SC) is of critical importance for broad scientific and engineering applications. However, it is a challenge to optimize complex, highorder SC on emerging clus...
Liu Peng, Richard Seymour, Ken-ichi Nomura, Rajiv ...
CGO
2006
IEEE
13 years 11 months ago
Selecting Software Phase Markers with Code Structure Analysis
Most programs are repetitive, where similar behavior can be seen at different execution times. Algorithms have been proposed that automatically group similar portions of a program...
Jeremy Lau, Erez Perelman, Brad Calder