PVM and other distributed computing systems have enabled the use of networks of workstations for parallel computation, but their approach of treating all networks as collections o...
We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system f...
Operating system lockup errors can render a computer unusable by preventing the execution other programs. Watchdog timers can be used to recover from a lockup by resetting the pro...
Francis M. David, Jeffrey C. Carlyle, Roy H. Campb...
We study how several collective operations like broadcast, reduction, scan, etc. can be composed efficiently in complex parallel programs. Our specific contributions are: (1) a fo...
Sergei Gorlatch, Christoph Wedler, Christian Lenga...
Multicasts are a powerful means to implement coordinated operations on distributed data-sets as well as synchronized reductions of multiple computed results. In this paper we prese...