Sciweavers

PPOPP
2010
ACM
14 years 2 months ago
Modeling advanced collective communication algorithms on cell-based systems
This paper presents and validates performance models for a variety of high-performance collective communication algorithms for systems with Cell processors. The systems modeled in...
Qasim Ali, Samuel P. Midkiff, Vijay S. Pai
PPOPP
2010
ACM
14 years 2 months ago
Application heartbeats for software performance and health
Adaptive, or self-aware, computing has been proposed to help application programmers confront the growing complexity of multicore software development. However, existing approache...
Henry Hoffmann, Jonathan Eastep, Marco D. Santambr...
PPOPP
2010
ACM
14 years 2 months ago
Modeling transactional memory workload performance
Transactional memory promises to make parallel programming easier than with fine-grained locking, while performing just as well. This performance claim is not always borne out bec...
Donald E. Porter, Emmett Witchel
PPOPP
2010
ACM
14 years 2 months ago
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
We present Lazy Binary Splitting (LBS), a user-level scheduler of nested parallelism for shared-memory multiprocessors that builds on existing Eager Binary Splitting work-stealing...
Alexandros Tzannes, George C. Caragea, Rajeev Baru...
PPOPP
2010
ACM
14 years 2 months ago
Scalable communication protocols for dynamic sparse data exchange
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
PPOPP
2010
ACM
14 years 2 months ago
GAMBIT: effective unit testing for concurrency libraries
As concurrent programming becomes prevalent, software providers are investing in concurrency libraries to improve programmer productivity. Concurrency libraries improve productivi...
Katherine E. Coons, Sebastian Burckhardt, Madanlal...
PPOPP
2010
ACM
14 years 2 months ago
Debugging programs that use atomic blocks and transactional memory
Ferad Zyulkyarov, Tim Harris, Osman S. Unsal, Adri...
PPOPP
2010
ACM
14 years 2 months ago
NOrec: streamlining STM by abolishing ownership records
Drawing inspiration from several previous projects, we present an ownership-record-free software transactional memory (STM) system that combines extremely low overhead with unusua...
Luke Dalessandro, Michael F. Spear, Michael L. Sco...
PPOPP
2010
ACM
14 years 2 months ago
Data transformations enabling loop vectorization on multithreaded data parallel architectures
Loop vectorization, a key feature exploited to obtain high performance on Single Instruction Multiple Data (SIMD) vector architectures, is significantly hindered by irregular memo...
Byunghyun Jang, Perhaad Mistry, Dana Schaa, Rodrig...
PPOPP
2010
ACM
14 years 2 months ago
Symbolic prefetching in transactional distributed shared memory
We present a static analysis for the automatic generation of symbolic prefetches in a transactional distributed shared memory. A symbolic prefetch specifies the first object to be...
Alokika Dash, Brian Demsky