—Computational performance increasingly depends on parallelism, and many systems rely on heterogeneous resources such as GPUs and FPGAs to accelerate computationally intensive ap...
Marcin Bogdanski, Peter R. Lewis, Tobias Becker, X...
To increase the scale and performance of scientific applications, scientists commonly distribute computation over multiple processors. Often without realizing it, file I/O is pa...
Seetharami R. Seelam, Andre Kerstens, Patricia J. ...
Abstract--Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in...
M. Mustafa Rafique, Ali Raza Butt, Dimitrios S. Ni...
Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access locality and bank-level parallelism, which in turn maximizes DRAM bandwid...
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch