Sciweavers

HOTI
2011
IEEE
12 years 4 months ago
The Common Communication Interface (CCI)
—There are many APIs for connecting and exchanging data between network peers. Each interface varies wildly based on metrics including performance, portability, and complexity. S...
Scott Atchley, David Dillow, Galen M. Shipman, Pat...
EUROPAR
2011
Springer
12 years 4 months ago
A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures
: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard ...
Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanim...
IPPS
2010
IEEE
13 years 2 months ago
An auto-tuning framework for parallel multicore stencil computations
Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addi...
Shoaib Kamil, Cy Chan, Leonid Oliker, John Shalf, ...
PPOPP
2010
ACM
14 years 1 months ago
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing...
Alexandros Tzannes, George C. Caragea, Rajeev Baru...