We present a dynamic optimization technique, thread warping, that uses a single processor on a multiprocessor system to dynamically synthesize threads into custom accelerator circ...
The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, handcoded microbenchmarks can be used to accelerate performance e...
Abstract: In this paper, we present an acceleration method for texture-based ray casting on the compute uniļ¬ed device architecture (CUDA) compatible graphics processing unit (GPU...
Despite a burgeoning demand for parallel programs, the tools available to developers working on shared-memory multicore processors have lagged behind. One reason for this is the l...
Marek Olszewski, Qin Zhao, David Koh, Jason Ansel,...
In this paper we propose the Merge framework, a general purpose programming model for heterogeneous multi-core systems. The Merge framework replaces current ad hoc approaches to p...
Michael D. Linderman, Jamison D. Collins, Hong Wan...