Modulo scheduling is an effective code generation technique that exploits the parallelism in program loops by overlapping iterations. One drawback of this optimization is that reg...
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
Performance of an application can be improved through augmenting the processor with Application specific Functional Units (AFUs). Usually a cluster of operations identified from th...
Portable embedded systems are being driven by consumer demands to be thermally efficient, perform faster, and have longer battery life. To design such a system, various hardware un...