Sciweavers

186 search results - page 15 / 38
» Real-Time Distributed Discrete-Event Execution with Fault To...
Sort
View
CCGRID
2010
IEEE
14 years 10 months ago
Selective Recovery from Failures in a Task Parallel Programming Model
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
CCGRID
2008
IEEE
14 years 11 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
EUROPAR
2008
Springer
14 years 11 months ago
Fault-Tolerant Partial Replication in Large-Scale Database Systems
We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely an...
Pierre Sutra, Marc Shapiro
ICPP
1987
IEEE
15 years 1 months ago
A Software-Based Hardware Fault Tolerance Scheme for Multicomputers
-- A hardware fault tolerance scheme for large multicomputers executing time-consuming non-interactive applications is described. Error detection and recovery are done mostly by so...
Yuval Tamir, Eli Gafni
ISCA
2011
IEEE
270views Hardware» more  ISCA 2011»
14 years 1 months ago
Sampling + DMR: practical and low-overhead permanent fault detection
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting an...
Shuou Nomura, Matthew D. Sinclair, Chen-Han Ho, Ve...