Although chip-multiprocessors have become the industry standard, developing parallel applications that target them remains a daunting task. Non-determinism, inherent in threaded a...
Marek Olszewski, Jason Ansel, Saman P. Amarasinghe
Abstract — In MIMD (Multiple Instruction stream, Multiple Data stream) execution, each processor has its own state. Although these states are generally considered to be independe...
Abstract—Program performance optimisations, feedbackdirected iterative compilation and auto-tuning systems [1] all assume a fixed estimation of execution time given a fixed inp...
Abdelhafid Mazouz, Sid Ahmed Ali Touati, Denis Bar...
Modern scientific experiments can generate large amounts of data, which may be replicated and distributed across multiple resources to improve application performance and fault to...
Hiding communication latency is an important optimization for parallel programs. Programmers or compilers achieve this by using non-blocking communication primitives and overlappi...