Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued perform...
Most software for embedded systems, including digital signal processing systems, is coded in assembly language. For both understanding the software and for reverse compiling it to...
This paper argues for an implicitly parallel programming model for many-core microprocessors, and provides initial technical approaches towards this goal. In an implicitly paralle...
Wen-mei W. Hwu, Shane Ryoo, Sain-Zee Ueng, John H....
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it opt...
Duo Liu, Zili Shao, Meng Wang, Minyi Guo, Jingling...