As the ever-increasing gap between the speed of processor and the speed of memory has become the cause of one of primary bottlenecks of computer systems, modern architecture system...
Abstract. Understanding program behavior is at the foundation of program optimization. Techniques for automatic recognition of program constructs characterize the behavior of code ...
The multi-level storage architecture has been widely adopted in servers and data centers. However, while prefetching has been shown as a crucial technique to exploit the sequentia...
Data parallel programs are sensitive to the distribution of data across processor nodes. We formulate the reduction of inter-node communication as an optimization on a colored gra...
We present a novel approach to ray tracing execution on commodity graphics hardware using CUDA. We decompose
a standard ray tracing algorithm into several data-parallel stages tha...