Sciweavers

87 search results - page 12 / 18
» Improving the Memory Bandwidth Utilization Using Loop Transf...
Sort
View
CASES
2008
ACM
15 years 1 months ago
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware
Automatic vectorization of programs for partitioned-ALU SIMD (Single Instruction Multiple Data) processors has been difficult because of not only data dependency issues but also n...
Hoseok Chang, Wonyong Sung
IPPS
2000
IEEE
15 years 4 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...
OSDI
2006
ACM
15 years 12 months ago
Connection Handoff Policies for TCP Offload Network Interfaces
This paper presents three policies for effectively utilizing TCP offload network interfaces that support connection handoff. These policies allow connection handoff to reduce the ...
Hyong-youb Kim, Scott Rixner
ICPP
2009
IEEE
15 years 6 months ago
Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors
—Lattice Boltzmann Methods (LBM) are used for the computational simulation of Newtonian fluid dynamics. LBM-based simulations are readily parallelizable; they have been implemen...
Peter Bailey, Joe Myre, Stuart D. C. Walsh, David ...
93
Voted
IWMM
2009
Springer
130views Hardware» more  IWMM 2009»
15 years 6 months ago
A component model of spatial locality
Good spatial locality alleviates both the latency and bandwidth problem of memory by boosting the effect of prefetching and improving the utilization of cache. However, convention...
Xiaoming Gu, Ian Christopher, Tongxin Bai, Chengli...