The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse line...
Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. MPI is a widely used mess...
Heterogeneous supercomputers with combined general purpose and accelerated CPUs promise to be the future major architecture due to their wideranging generality and superior perfor...
A statistical model for the purpose of logic cell timing analysis in the presence of process variations is presented. A new current-based cell delay model is utilized, which can a...
—The performance bottleneck for many scientific applications is the cost of memory access inside linear algebra kernels. Tuning such kernels for memory efficiency is a complex ...