Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is...
With the current trend toward multicore architectures, improved execution performance can no longer be obtained via traditional single-thread instruction level parallelism (ILP), ...
Occam is a parallel processing language designed by a team at INMOS in conjunction with the design of the transputer processor, and based on Sir T. Hoare's ideas of Communica...
The Cell BE processor provides both scalable computation power and flexibility, and it is already being adopted for many computational intensive applications like aerospace, defens...
Studying program behavior is a central component in architectural designs. In this paper, we study and exploit one aspect of program behavior, the behavior repetition, to expedite...