We develop real-time scheduling techniques for improving performance and energy for multiprogrammed workloads that scale nonuniformly with increasing thread counts. Multithreaded ...
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across ...
Driven by the market demand for high-definition 3D graphics, commodity graphics processing units (GPUs) have evolved into highly parallel, multi-threaded, many-core processors, whi...
There is building interest in using FPGAs as accelerators for high-performance computing, but existing systems for programming them are so far inadequate. In this paper we propose...
This paper presents and validates methods to extend reuse distance analysis of application locality characteristics to shared-memory multicore platforms by accounting for invalidat...
Derek L. Schuff, Benjamin S. Parsons, Vijay S. Pai