The behavior of many algorithms is heavily determined by the input data. Furthermore, this often means that multiple and completely different execution paths can be followed, also...
Marc Leeman, Chantal Ykman-Couvreur, David Atienza...
Traditional software controlled data cache prefetching is often ineffective due to the lack of runtime cache miss and miss address information. To overcome this limitation, we imp...
Jiwei Lu, Howard Chen, Rao Fu, Wei-Chung Hsu, Bobb...
The performance benefits of GPU parallelism can be enormous, but unlocking this performance potential is challenging. The applicability and performance of GPU parallelizations is...
Thomas B. Jablin, Prakash Prabhu, James A. Jablin,...
Sophisticated C compiler support for network processors (NPUs) is required to improve their usability and consequently, their acceptance in system design. Nonetheless, high-level ...
Data locality is critical to achievinghigh performance on large-scale parallel machines. Non-local data accesses result in communication that can greatly impact performance. Thus ...