Recently a GPU has acquired programmability to perform general purpose computation fast by running ten thousands of threads concurrently. This paper presents a new algorithm for d...
Existing runtime parallelization techniques impose severe performance penalties when a speculative parallelization is attempted and fails. Some techniques require a sequential rest...
Abstract. Nested data-parallel programs often have large memory requirements due to their high degree of parallelism. Piecewise execution is an implementation technique used to min...
This paper explores microarchitecture models for a simultaneous multithreaded processor with multimedia enhancements. We enhance a wide-issue superscalar processor by the simultan...
This paper investigates an approach for statically preventing race conditions in an object-oriented language. The setting of this work is a variant of Gordon and Hankin’s concurr...