Sciweavers

198 search results - page 32 / 40
» Automatic Performance Diagnosis of Parallel Computations wit...
Sort
View
HPDC
2010
IEEE
15 years 22 days ago
Multi-GPU volume rendering using MapReduce
In this paper we present a multi-GPU parallel volume rendering implemention built using the MapReduce programming model. We give implementation details of the library, including s...
Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, John ...
90
Voted
IPPS
2009
IEEE
15 years 6 months ago
A cross-input adaptive framework for GPU program optimizations
Abstract—Recent years have seen a trend in using graphic processing units (GPU) as accelerators for general-purpose computing. The inexpensive, single-chip, massively parallel ar...
Yixun Liu, Eddy Z. Zhang, Xipeng Shen
113
Voted
ESCIENCE
2006
IEEE
15 years 3 months ago
Scientific Workflows: More e-Science Mileage from Cyberinfrastructure
We view scientific workflows as the domain scientist's way to harness cyberinfrastructure for e-Science. Domain scientists are often interested in "end-to-end" fram...
Bertram Ludäscher, Shawn Bowers, Timothy M. M...
108
Voted
EUROPAR
2011
Springer
13 years 11 months ago
A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures
: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard ...
Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanim...
106
Voted
IPPS
2000
IEEE
15 years 4 months ago
Dynamic Data Layouts for Cache-Conscious Factorization of DFT
Effective utilization of cache memories is a key factor in achieving high performance in computing the Discrete Fourier Transform (DFT). Most optimizationtechniques for computing ...
Neungsoo Park, Dongsoo Kang, Kiran Bondalapati, Vi...