High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...
Splatting is widely applied in many areas, including volume, point-based, and image-based rendering. Improvements to splatting, such as eliminating popping and color bleeding, occ...
Jian Huang, Roger Crawfis, Naeem Shareef, Klaus Mu...
There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most...
Stride prefetching is recognized as an important technique to improve memory access performance. The prior work usually profiles and/or analyzes the program behavior offline, and u...
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...