Sciweavers

240 search results - page 30 / 48
» LUDE: A Distributed Software Library
Sort
View
IPPS
2008
IEEE
15 years 6 months ago
Build to order linear algebra kernels
—The performance bottleneck for many scientific applications is the cost of memory access inside linear algebra kernels. Tuning such kernels for memory efficiency is a complex ...
Jeremy G. Siek, Ian Karlin, Elizabeth R. Jessup
IPPS
2008
IEEE
15 years 6 months ago
Lattice Boltzmann simulation optimization on leading multicore platforms
We present an auto-tuning approach to optimize application performance on emerging multicore architectures. The methodology extends the idea of searchbased performance optimizatio...
Samuel Williams, Jonathan Carter, Leonid Oliker, J...
IPPS
2007
IEEE
15 years 6 months ago
SWARM: A Parallel Programming Framework for Multicore Processors
Due to fundamental physical limitations and power constraints, we are witnessing a radical change in commodity microprocessor architectures to multicore designs. Continued perform...
David A. Bader, Varun Kanade, Kamesh Madduri
IPPS
2002
IEEE
15 years 4 months ago
A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms
Short vector SIMD instructions on recent microprocessors, such as SSE on Pentium III and 4, speed up code but are a major challenge to software developers. We present a compiler t...
Franz Franchetti, Markus Püschel
PPOPP
2012
ACM
13 years 7 months ago
Concurrent breakpoints
In program debugging, reproducibility of bugs is a key requirement. Unfortunately, bugs in concurrent programs are notoriously difficult to reproduce because bugs due to concurre...
Chang-Seo Park, Koushik Sen