The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a ...
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scot...
Global addressing of shared data simplifies parallel programming and complements message passing models commonly found in distributed memory machines. A number of programming sys...
Beng-Hong Lim, Chi-Chao Chang, Grzegorz Czajkowski...
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...
We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size t...