To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch
—Application mobility is an efficient way to mask uneven conditioning and reduce users’ distractions in pervasive environments. However, since mobility brings more dynamism and...
Fault tolerance schemes for mobile agents to survive agent server crash failures are complex since developers normally have no control over remote agent servers. Some solutions mo...
We introduce Re-FUSE, a framework that provides support for restartable user-level file systems. Re-FUSE monitors the user-level file-system and on a crash transparently restart...