Task graph scheduling has been found effective in performance prediction and optimization of parallel applications. A number of static scheduling algorithms have been proposed for...
High-performance computing clusters running longlived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss oppo...
The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in t...
Tom Bergan, Owen Anderson, Joseph Devietti, Luis C...
Widespread adaptation of shared memory programming for High Performance Computing has been inhibited by a lack of standardization and the resulting portability problems between pl...
Global address space languages like UPC exhibit high performance and portability on a broad class of shared and distributed memory parallel architectures. The most scalable applic...