Many parallel applications from scientific computing use MPI global communication operations to collect or distribute data. Since the execution times of these communication opera...
This paper presents a performance analysis of an accelerated 2-D rigid image registration implementation that employs the Compute Unified Device Architecture (CUDA) programming e...
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of relia...
Recent studies of irregular applications such as finite-element mesh generators and data-clustering codes have shown that these applications have a generalized data parallelism ar...
Building the hardware for a high-performance distributed computer system is a lot easier than building its software. In this paper we describe a model for programtributed systems ...
Andrew S. Tanenbaum, Henri E. Bal, M. Frans Kaasho...