Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at run-time [13, 1]. Allocating the data structure [13] to separate memories when...
Automatic parallelization is a promising strategy to improve application performance in the multicore era. However, common programming practices such as the reuse of data structur...
Nick P. Johnson, Hanjun Kim, Prakash Prabhu, Ayal ...
High performance computing on parallel architectures currently uses different approaches depending on the hardory model of the architecture, the abstraction level of the programmi...
A microprocessor's performance is fundamentally limited by the rate at which it can resolve branch mispredictions. Control independence (CI) architectures look for useful con...
Kshitiz Malik, Mayank Agarwal, Sam S. Stone, Kevin...
High-performance computing faces considerable change as the Internet and the Grid mature. Applications that once were tightly-coupled and monolithic are now decentralized, with co...
Patrick Widener, Greg Eisenhauer, Karsten Schwan, ...