Sciweavers

146 search results - page 2 / 30
» Transparent Checkpoint-Restart of Distributed Applications o...
Sort
View
HIPC
2009
Springer
13 years 2 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
ICPP
2009
IEEE
13 years 11 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
PVM
2005
Springer
13 years 10 months ago
Scalable Fault Tolerant MPI: Extending the Recovery Algorithm
ct Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The init...
Graham E. Fagg, Thara Angskun, George Bosilca, Jel...
IPPS
2005
IEEE
13 years 10 months ago
User Transparent Parallel Processing of the 2004 NIST TRECVID Data Set
The Parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential m...
Frank J. Seinstra, Cees Snoek, Dennis Koelma, Jan-...
PPOPP
2006
ACM
13 years 11 months ago
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
Rosalia Christodoulopoulou, Kaloian Manassiev, Ang...