An efficient memory operations optimization technique for vector loops on Itanium 2 processors

15 years 5 months ago

Download www.prism.uvsq.fr

To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking structure cause severe performance loss, even for very simple regular codes. We also show that grouping the memory operations in a pseudo-vectorized way enables the compiler to generate more effective code for the Itanium 2 processor. The impact of this code optimization technique on register pressure is analyzed for various vectorization schemes. keywords Performance Measurement, Cache Optimization, Memory Access Optimization, Bank Conflicts, Memory Address Disambiguation, Instruction Level Parallelism.

William Jalby, Christophe Lemuet, Sid Ahmed Ali To

Real-time Traffic

Cache Systems | CONCURRENCY 2006 | Instruction Level Parallelism | Memory Disambiguation Mechanism |

claim paper

» Applications of storage mapping optimization to register promotion

» Exploiting Vector Parallelism in Software Pipelined Loops

» Efficient Selection of Vector Instructions Using Dynamic Programming

» Vectorization for SIMD architectures with alignment constraints

» Performance of OSCAR Multigrain Parallelizing Compiler on SMP Servers

» Highorder stencil computations on multicore clusters

» Optimizing Galois Field Arithmetic for Diverse Processor Architectures and Applications

» Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	CONCURRENCY
Authors	William Jalby, Christophe Lemuet, Sid Ahmed Ali Touati

Comments (0)

Sciweavers

An efficient memory operations optimization technique for vector loops on Itanium 2 processors

Cache Systems | CONCURRENCY 2006 | Instruction Level Parallelism | Memory Disambiguation Mechanism |

Explore & Download

Productivity Tools

Sciweavers