Abstract--The centralized structures necessary for the extraction of instruction-level parallelism (ILP) are consuming progressively smaller portions of the total die area of chip ...
The novel design of an efficient FPGA interconnection architecture with multiple Switch Boxes (SB) and hardwired connections for realizing data intensive applications (i.e. DSP ap...
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make ti...
Thread-level speculation (TLS) allows potentially dependent threads to speculatively execute in parallel, thus making it easier for the compiler to extract parallel threads. Howeve...