Modern commodity hardware architectures, with their multiple multi-core CPUs and high-speed system interconnects, exhibit tremendous power. In this paper, we study performance lim...
Norbert Egi, Adam Greenhalgh, Mark Handley, Micka&...
In many scientific applications, significant time is spent tuning codes for a particular highperformance architecture. Tuning approaches range from the relatively nonintrusive (...
Albert Hartono, Boyana Norris, Ponnuswamy Sadayapp...
In a Linux cluster, as in any multi-processor system, the inter-processor communication rate is the major limiting factor to its general usefulness. This research is geared toward...
We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size t...
Interprocedural dataflow analysis has a large number of uses for software optimization, maintenance, testing, and verification. For software built with reusable components, the tra...