Sciweavers

131 search results - page 19 / 27
» Copy Elimination for Parallelizing Compilers
Sort
View
CF
2006
ACM
14 years 11 months ago
Intermediately executed code is the key to find refactorings that improve temporal data locality
The growing speed gap between memory and processor makes an efficient use of the cache ever more important to reach high performance. One of the most important ways to improve cac...
Kristof Beyls, Erik H. D'Hollander
CGO
2010
IEEE
15 years 4 months ago
Automatic creation of tile size selection models
Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is...
Tomofumi Yuki, Lakshminarayanan Renganarayanan, Sa...
IEEEPACT
2002
IEEE
15 years 2 months ago
A Framework for Parallelizing Load/Stores on Embedded Processors
Many modern embedded processors (esp. DSPs) support partitioned memory banks (also called X-Y memory or dual bank memory) along with parallel load/store instructions to achieve co...
Xiaotong Zhuang, Santosh Pande, John S. Greenland ...
CJ
2006
84views more  CJ 2006»
14 years 9 months ago
Instruction Level Parallelism through Microthreading - A Scalable Approach to Chip Multiprocessors
Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction ...
Kostas Bousias, Nabil Hasasneh, Chris R. Jesshope
ISCA
1995
IEEE
93views Hardware» more  ISCA 1995»
15 years 1 months ago
Optimizing Memory System Performance for Communication in Parallel Computers
Communicationin aparallel systemfrequently involvesmoving data from the memory of one node to the memory of another; this is the standard communication model employedin message pa...
Thomas Stricker, Thomas R. Gross