Sciweavers

23 search results - page 2 / 5
» Recent advances in checkpoint recovery systems
Sort
View
USENIX
2007
13 years 7 months ago
Transparent Checkpoint-Restart of Multiple Processes on Commodity Operating Systems
The ability to checkpoint a running application and restart it later can provide many useful benefits including fault recovery, advanced resources sharing, dynamic load balancing...
Oren Laadan, Jason Nieh
PVLDB
2008
110views more  PVLDB 2008»
13 years 4 months ago
Fault-tolerant stream processing using a distributed, replicated file system
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...

Presentation
324views
11 years 11 months ago
Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault
In this paper, we propose a task scheduling al-gorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore process...

Publication
165views
11 years 10 months ago
Task scheduling algorithm for multicore processor system for minimizing recovery time in case of single node fault
In this paper, we propose a task scheduling algorithm for a multicore processor system which reduces the recovery time in case of a single fail-stop failure of a multicore processo...
Shohei Gotoda, Naoki Shibata and Minoru Ito
CLUSTER
2004
IEEE
13 years 9 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...