Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes p...
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan...
Allowing loads to issue out-of-order with respect to earlier unresolved store addresses is very important for extracting parallelism in large-window superscalar processors. Blindl...
Inherent within complex instruction set architectures such as x86 are inefficiencies that do not exist in a simpler ISAs. Modern x86 implementations decode instructions into one o...
Brian Slechta, David Crowe, Brian Fahs, Michael Fe...
Clusters of high-end workstations and PCs are currently used in many application domains to perform large-scale computations or as scalable servers for I/O bound tasks. Although c...
Mispredicted branches and loads that miss in the cache cause the majority of retirement stalls experienced by sequential processors; we call these critical instructions. Despite t...