Sciweavers

5523 search results - page 957 / 1105
» Improving application performance with hardware data structu...
Sort
View
CODES
2000
IEEE
15 years 8 months ago
Memory architecture for efficient utilization of SDRAM: a case study of the computation/memory access trade-off
This paper discusses the trade-off between calculations and memory accesses in a 3D graphics tile renderer for visualization of data from medical scanners. The performance require...
Thomas Gleerup, Hans Holten-Lund, Jan Madsen, Stee...
SOSP
2009
ACM
16 years 1 months ago
ODR: output-deterministic replay for multicore debugging
Reproducing bugs is hard. Deterministic replay systems address this problem by providing a high-fidelity replica of an original program run that can be repeatedly executed to zer...
Gautam Altekar, Ion Stoica
SI3D
2010
ACM
15 years 11 months ago
Parallel Banding Algorithm to compute exact distance transform with the GPU
We propose a Parallel Banding Algorithm (PBA) on the GPU to compute the exact Euclidean Distance Transform (EDT) for a binary image in 2D and higher dimensions. Partitioning the i...
Thanh-Tung Cao, Ke Tang, Anis Mohamed, Tiow Seng T...
PPPJ
2009
ACM
15 years 11 months ago
Automatic parallelization for graphics processing units
Accelerated graphics cards, or Graphics Processing Units (GPUs), have become ubiquitous in recent years. On the right kinds of problems, GPUs greatly surpass CPUs in terms of raw ...
Alan Leung, Ondrej Lhoták, Ghulam Lashari
IPPS
2007
IEEE
15 years 10 months ago
A Power-Aware Prediction-Based Cache Coherence Protocol for Chip Multiprocessors
Snoopy cache coherence protocols broadcast requests to all nodes, reducing the latency of cache to cache transfer misses at the expense of increasing interconnect power. We propos...
Ehsan Atoofian, Amirali Baniasadi