Sciweavers

347 search results - page 50 / 70
» Caching processor general registers
Sort
View
PLDI
2005
ACM
15 years 3 months ago
Demystifying on-the-fly spill code
Modulo scheduling is an effective code generation technique that exploits the parallelism in program loops by overlapping iterations. One drawback of this optimization is that reg...
Alex Aletà, Josep M. Codina, Antonio Gonz&a...
IEEEPACT
2002
IEEE
15 years 2 months ago
Optimizing Loop Performance for Clustered VLIW Architectures
Modern embedded systems often require high degrees of instruction-level parallelism (ILP) within strict constraints on power consumption and chip cost. Unfortunately, a high-perfo...
Yi Qian, Steve Carr, Philip H. Sweany
116
Voted
EUROPAR
2010
Springer
14 years 10 months ago
Optimized Dense Matrix Multiplication on a Many-Core Architecture
Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
82
Voted
VLSID
2007
IEEE
154views VLSI» more  VLSID 2007»
15 years 10 months ago
Application Specific Datapath Extension with Distributed I/O Functional Units
Performance of an application can be improved through augmenting the processor with Application specific Functional Units (AFUs). Usually a cluster of operations identified from th...
Nagaraju Pothineni, Anshul Kumar, Kolin Paul
GECCO
2004
Springer
148views Optimization» more  GECCO 2004»
15 years 3 months ago
A Multi-objective Approach to Configuring Embedded System Architectures
Portable embedded systems are being driven by consumer demands to be thermally efficient, perform faster, and have longer battery life. To design such a system, various hardware un...
James Northern III, Michael A. Shanblatt