To better manage the ever increasing complexity of LAM/MPI, we have created a lightweight component architecture for it that is specifically designed for high-performance message p...
In this paper, we provide examples of how thread-level speculation (TLS) simplifies manual parallelization and enhances its performance. A number of techniques for manual parallel...
We present a new method for dynamically detecting potential data races in multithreaded programs. Our method improves on the state of the art in accuracy, in usability, and in ove...
A design pattern is a mechanism for encapsulating the knowledge of experienced designers into a re-usable artifact. Parallel design patterns reflect commonly occurring parallel co...
Kai Tan, Duane Szafron, Jonathan Schaeffer, John A...
Simultaneous multithreading (SMT) represents a fundamental shift in processor capability. SMT's ability to execute multiple threads simultaneously within a single CPU offers ...
In programming high performance applications, shared address-space platforms are preferable for fine-grained computation, while distributed address-space platforms are more suita...
Sensor networks are long-running computer systems with many sensing/compute nodes working to gather information about their environment, process and fuse that information, and in ...
Gather and scatter are data redistribution functions of longstanding importance to high performance computing. In this paper, we present a highly-general array operator with power...
Steven J. Deitz, Bradford L. Chamberlain, Sung-Eun...
Large-scale cluster-based Internet services often host partitioned datasets to provide incremental scalability. The aggregation of results produced from multiple partitions is a f...