The high arithmetic rates of media processing applications require architectures with tens to hundreds of functional units, multiple register files, and explicit interconnect betw...
Peter R. Mattson, William J. Dally, Scott Rixner, ...
Divide and conquer algorithms are a good match for modern parallel machines: they tend to have large amounts of inherent parallelism and they work well with caches and deep memory...
Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through...
John B. Carter, Wilson C. Hsieh, Leigh Stoller, Ma...
This paper presents the Cameron Project 1 , which aims to provide a high level, algorithmic language and optimizing compiler for the development of image processing applications o...
Abstract. A basic prerequisite for parallel programming is a good communication API. The recent interest in using Java for scienti c and engineering application has led to several ...
Mark Baker, Bryan Carpenter, Geoffrey Fox, Sung Ho...