High-performance computing clusters running longlived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss oppo...
Abstract--With General Purpose programmable GPUs becoming more and more popular, automated tools are needed to bridge the gap between achievable performance from highly parallel ar...
Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout ...
The Cell BE processor provides both scalable computation power and flexibility, and it is already being adopted for many computational intensive applications like aerospace, defens...
Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded a...
Marek Olszewski, Jason Ansel, Saman P. Amarasinghe