Sciweavers

61 search results - page 2 / 13
» Dynamic Data Replication: An Approach to Providing Fault-Tol...
Sort
View
ICDE
2010
IEEE
379views Database» more  ICDE 2010»
14 years 4 months ago
Osprey: Implementing MapReduce-Style Fault Tolerance in a Shared-Nothing Distributed Database
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...
INFOCOM
2008
IEEE
13 years 11 months ago
Towards Optimal Resource Allocation in Partial-Fault Tolerant Applications
—We introduce Zen, a new resource allocation framework that assigns application components to node clusters to achieve high availability for partial-fault tolerant (PFT) applicat...
Nikhil Bansal, Ranjita Bhagwan, Navendu Jain, Yoon...
PPOPP
2006
ACM
13 years 10 months ago
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
Rosalia Christodoulopoulou, Kaloian Manassiev, Ang...
CCGRID
2006
IEEE
13 years 11 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
CCGRID
2006
IEEE
13 years 11 months ago
ReCon: A Fast and Reliable Replica Retrieval Service for the Data Grid
The Data Grid provides a scalable infrastructure for storage resources and data distribution management. It also supports a variety of scientific applications that require access...
XiaoLi Zhou, Eunsung Kim, Jai Wug Kim, Heon Young ...