Sciweavers

IEEEPACT
2002
IEEE

Optimizing Loop Performance for Clustered VLIW Architectures

13 years 9 months ago
Optimizing Loop Performance for Clustered VLIW Architectures
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-performance embedded processor with high ILP generally puts large demands on register resources, making it difficult to maintain a single, multi-ported register bank. To address this problem, some architectures, e.g. the Texas Instruments TMS320C6x, partition the register bank into multiple banks that are each directly connected only to a subset of functional units. These functional unit/register bank groups are called clusters. Clustered architectures require that either copy operations or delay slots be inserted when an operation accesses data stored on a different cluster. In order to generate excellent code for such architectures, the compiler must not only spread the computation across clusters to achieve maximum parallelism, but also must limit the effects of intercluster data transfers. Loop unrolling and ...
Yi Qian, Steve Carr, Philip H. Sweany
Added 15 Jul 2010
Updated 15 Jul 2010
Type Conference
Year 2002
Where IEEEPACT
Authors Yi Qian, Steve Carr, Philip H. Sweany
Comments (0)