Sciweavers

212 search results - page 1 / 43
» Model-based fault localization in large-scale computing syst...
Sort
View
MIDDLEWARE
2010
Springer
13 years 3 months ago
dFault: Fault Localization in Large-Scale Peer-to-Peer Systems
Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more imp...
Pawan Prakash, Ramana Rao Kompella, Venugopalan Ra...
ICDCS
2009
IEEE
14 years 2 months ago
Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems
With the growing complexity in computer systems, it has been a real challenge to detect and diagnose problems in today’s large-scale distributed systems. Usually, the correlatio...
Jing Gao, Guofei Jiang, Haifeng Chen, Jiawei Han
IPTPS
2005
Springer
13 years 10 months ago
Practical Locality-Awareness for Large Scale Information Sharing
Tulip is an overlay for routing, searching and publish-lookup information sharing. It offers a unique combination of the advantages of both structured and unstructured overlays, t...
Ittai Abraham, Ankur Badola, Danny Bickson, Dahlia...
ICPPW
2008
IEEE
13 years 11 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
EUROPAR
2008
Springer
13 years 6 months ago
Fault-Tolerant Partial Replication in Large-Scale Database Systems
We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely an...
Pierre Sutra, Marc Shapiro