Sciweavers

ICS
2011
Tsinghua U.
12 years 8 months ago
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Teresa Davies, Christer Karlsson, Hui Liu, Chong D...
EUROSYS
2011
ACM
12 years 8 months ago
Refuse to crash with Re-FUSE
We introduce Re-FUSE, a framework that provides support for restartable user-level file systems. Re-FUSE monitors the user-level file-system and on a crash transparently restart...
Swaminathan Sundararaman, Laxman Visampalli, Andre...
EUROSYS
2011
ACM
12 years 8 months ago
Increasing performance in byzantine fault-tolerant systems with on-demand replica consistency
Traditional agreement-based Byzantine fault-tolerant (BFT) systems process all requests on all replicas to ensure consistency. In addition to the overhead for BFT protocol and sta...
Tobias Distler, Rüdiger Kapitza
TPDS
2010
125views more  TPDS 2010»
12 years 11 months ago
Dealing with Transient Faults in the Interconnection Network of CMPs at the Cache Coherence Level
The importance of transient faults is predicted to grow due to current technology trends of increased scale of integration. One of the components that will be significantly affecte...
Ricardo Fernández Pascual, José M. G...
SIGOPS
2011
210views Hardware» more  SIGOPS 2011»
12 years 11 months ago
Small trusted primitives for dependable systems
Secure, fault-tolerant distributed systems are difficult to build, to validate, and to operate. Conservative design for such systems dictates that their security and fault toleran...
Petros Maniatis, Byung-Gon Chun
IJWMC
2010
115views more  IJWMC 2010»
13 years 1 months ago
Small-world effects in wireless agent sensor networks
Coverage, fault tolerance and power consumption constraints make optimal placement of mobile sensors or other mobile agents a hard problem. We have developed a model for describin...
Kenneth A. Hawick, Heath A. James
HIPC
2009
Springer
13 years 2 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
FAST
2009
13 years 2 months ago
Tiered Fault Tolerance for Long-Term Integrity
Fault-tolerant services typically make assumptions about the type and maximum number of faults that they can tolerate while providing their correctness guarantees; when such a fau...
Byung-Gon Chun, Petros Maniatis, Scott Shenker, Jo...
ICES
2010
Springer
160views Hardware» more  ICES 2010»
13 years 2 months ago
Fault Tolerance of Embryonic Algorithms in Mobile Networks
In previous work the authors have described an approach for building distributed self
David Lowe, Amir Mujkanovic, Daniele Miorandi, Lid...
PVLDB
2010
189views more  PVLDB 2010»
13 years 3 months ago
Slicing Long-Running Queries
The ability to decompose a complex, long-running query into simpler queries that produce the same result is useful for many scenarios, such as admission control, resource manageme...
Nicolas Bruno, Vivek R. Narasayya, Ravishankar Ram...