Through the algorthmic design patterns of data parallelism and task parallelism, the graphics processing unit (GPU) offers the potential to vastly accelerate discovery and innovat...
Jeremy S. Archuleta, Yong Cao, Thomas Scogland, Wu...
Increasing focus on power dissipation issues in current microprocessors has led to a host of proposals for clock gating and other power-saving techniques. While generally effectiv...
We present predictive performance models of two of the petascale applications, S3D and GTC, from the DOE Office of Science workload. We outline the development of these models and...
Replacing functional units of an extensible processor with reconfigurable functional units enhances performance and flexibility of processors to execute custom instructions. That ...
This paper presents a new partitioning algorithm to perform matrix multiplication on two interconnected heterogeneous processors. Data is partitioned in a way which minimizes the ...