We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
The emergence of heterogeneous many core architectures presents a unique opportunity for delivering order of magnitude performance increases to high performance applications by ma...
In the sub-micron technology era, wire delays are becoming much more important than gate delays, making it particularly attractive to go for clustered designs. A common form of cl...
Large–scale parallel applications performing global synchronization may spend a significant amount of execution time waiting for the completion of a barrier operation. Conseque...
Parallel file subsystems in today’s high-performance computers adopt many I/O optimization strategies that were designed for distributed systems. These strategies, for instance...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...