A router design for torus networks that significantly reduces message latency over traditional wormhole routers is presented in this paper. This new router implements virtual cut-...
Increases in instruction level parallelism are needed to exploit the potential parallelism available in future wide issue architectures. Predicated execution is an architectural m...
Lori Carter, Beth Simon, Brad Calder, Larry Carter...
In parallel processing systems, a fundamental consideration is the maximization of system performance through task mapping. A good allocation strategy may improve resource utilizat...
S. Mounir Alaoui, Ophir Frieder, Tarek A. El-Ghaza...
Both inherently sequential code and limitations of analysis techniques prevent full parallelization of many applications by parallelizing compilers. Amdahl's Law tells us tha...
In this paper we propose the use of an optical network not only as the communication medium, but also as a system-wide cache for the shared data in a multiprocessor. More specifica...