Sciweavers

36 search results - page 4 / 8
» Checkpointing and recovery in a transaction-based DSM operat...
Sort
View
USENIX
2007
13 years 8 months ago
Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems
The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing...
Oren Laadan, Jason Nieh
ISSRE
2002
IEEE
13 years 11 months ago
The Impact of Recovery Mechanisms on the Likelihood of Saving Corrupted State
Recovery systems must save state before a failure occurs to enable the system to recover from the failure. However, recovery will fail if the recovery system saves any state corru...
Subhachandra Chandra, Peter M. Chen
SRDS
2003
IEEE
13 years 11 months ago
Raptor: Integrating Checkpoints and Thread Migration for Cluster Management
distributed shared-memory (SDSM) provides the abstraction necessary to run shared-memory applications on cost-effective parallel platforms such as clusters of workstations. Howeve...
Hazim Shafi, Evan Speight, John K. Bennett
FAST
2010
13 years 8 months ago
Membrane: Operating System Support for Restartable File Systems
We introduce Membrane, a set of changes to the operating system to support restartable file systems. Membrane allows an operating system to tolerate a broad class of file system f...
Swaminathan Sundararaman, Sriram Subramanian, Abhi...
PVLDB
2008
110views more  PVLDB 2008»
13 years 5 months ago
Fault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...