This paper focuses on generating efficient software pipelined schedules for in-order machines, which we call Converged Trace Schedules. For a candidate loop, we form a string of t...
To construct complete systems on silicon, application specic DSP accelerators are needed to speed up the execution of high throughput DSP algorithms. In this paper, a methodology...
Patrick Schaumont, Bart Vanthournout, Ivo Bolsens,...
Aggressive compiler optimizations such as software pipelining and loop invariant code motion can significantly improve application performance, but these transformations often re...
Chris Zimmer, Stephen Roderick Hines, Prasad Kulka...
Multi-core architectures are increasingly being adopted in the design of emerging complex embedded systems. Key issues of designing such systems are on-chip interconnects, memory a...
Ying Yi, Wei Han, Xin Zhao, Ahmet T. Erdogan, Tugh...
We present an architecture that features dynamic multithreading execution of a single program. Threads are created automatically by hardware at procedure and loop boundaries and e...