Information on the behavior of programs is essential for deciding the number and nature of functional units in high performance architectures. In this paper, we present studies on...
Lizy Kurian John, Vinod Reddy, Paul T. Hulina, Lee...
Most cluster systems used in high performance computing do not allow process relocation at run-time. Finding an allocation that results in minimal completion time is NP-hard and (n...
Consider any known sequential algorithm for matrix multiplication over an arbitrary ring with time complexity ON , where 2 3. We show that such an algorithm can be parallelize...
This paper describes a global progressive register allocator, a register allocator that uses an expressive model of the register allocation problem to quickly find a good allocat...
Chip-Multi-Processors (CMP) utilize multiple energy-efficient Processing Elements (PEs) to deliver high performance while maintaining an efficient ratio of performance to energy-c...