Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and...
Abstract. In SMT processors several threads run simultaneously to increase available ILP, sharing but competing for resources. The instruction fetch policy plays a key role, determ...
While clusters of commodity servers and switches are the most popular form of large-scale parallel computers, many programs are not easily parallelized for execution upon them. In...
Hanjun Kim, Arun Raman, Feng Liu, Jae W. Lee, Davi...
This paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn Parallel Performance Tools running with...