Sciweavers

103 search results - page 12 / 21
» Asymmetric Memory Fences: Optimizing Both Performance and Im...
Sort
View
ISCA
2006
IEEE
154views Hardware» more  ISCA 2006»
15 years 4 months ago
SODA: A Low-power Architecture For Software Radio
The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a ...
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scot...
PPOPP
1997
ACM
15 years 2 months ago
Performance Implications of Communication Mechanisms in All-Software Global Address Space Systems
Global addressing of shared data simplifies parallel programming and complements message passing models commonly found in distributed memory machines. A number of programming sys...
Beng-Hong Lim, Chi-Chao Chang, Grzegorz Czajkowski...
ASPLOS
1994
ACM
15 years 2 months ago
Compiler Optimizations for Improving Data Locality
In the past decade, processor speed has become significantly faster than memory speed. Small, fast cache memories are designed to overcome this discrepancy, but they are only effe...
Steve Carr, Kathryn S. McKinley, Chau-Wen Tseng
ASPLOS
2011
ACM
14 years 1 months ago
Sponge: portable stream programming on graphics engines
Graphics processing units (GPUs) provide a low cost platform for accelerating high performance computations. The introduction of new programming languages, such as CUDA and OpenCL...
Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor N. ...
ICMCS
2008
IEEE
208views Multimedia» more  ICMCS 2008»
15 years 4 months ago
Fast computation of general Fourier Transforms on GPUS
We present an implementation of general FFTs for graphics processing units (GPUs). Unlike most existing GPU FFT implementations, we handle both complex and real data of any size t...
Brandon Lloyd, Chas Boyd, Naga K. Govindaraju