To achieve high-performance on multicore systems, sharedmemory parallel languages must efficiently implement atomic operations. The commonly used and studied paradigms for atomici...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...
—Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. While Moore’s law ensures that the computatio...
Nawab Ali, Philip H. Carns, Kamil Iskra, Dries Kim...
— The Cell Broadband Engine (BE) is a heterogeneous multicore processor, combining a general-purpose POWER architecture core with eight independent single-instructionmultiple-dat...
Sadaf R. Alam, Jeremy S. Meredith, Jeffrey S. Vett...
With the advent of chip-multiprocessors, we are faced with the challenge of parallelizing performance-critical software. Transactional memory (TM) has emerged as a promising progr...
Marek Olszewski, Jeremy Cutler, J. Gregory Steffan