Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple...
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to ge...
We compare the performance of systems consisting of one large cluster containing q processors with systems where processors are grouped into k clusters containing u processors eac...
On systems with multi-core processors, the memory access scheduling scheme plays an important role not only in utilizing the limited memory bandwidth but also in balancing the pro...
Well designed domain specific languages enable the easy expression of problems, the application of domain specific optimizations, and dramatic improvements in productivity for t...
Jun Cao, Ayush Goyal, Samuel P. Midkiff, James M. ...