To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling th...
Yu-Kwong Kwok, Anthony A. Maciejewski, Howard Jay ...
- Dynamic power consumption in CMOS circuits is usually estimated based on the number of signal transitions. However, when considering glitches, this is not accurate because narrow...
Abstract. While the fetch unit has been identified as one of the major bottlenecks of Simultaneous Multithreading architecture, several fetch schemes were proposed by prior works t...
Code restructuring compilers rely heavily on program analysis techniques to automatically detect data dependences between program statements. Dependences between statement instanc...
Johnnie Birch, Robert A. van Engelen, Kyle A. Gall...
Bank locality can be defined as localizing the number of load/store accesses to a small set of memory banks at a given time. An optimizing compiler can modify a given input code t...
Guilin Chen, Mahmut T. Kandemir, Hendra Saputra, M...