Sciweavers

212 search results - page 1 / 43
» Model-based fault localization in large-scale computing syst...
Sort
View
MIDDLEWARE
2010
Springer
13 years 4 months ago
dFault: Fault Localization in Large-Scale Peer-to-Peer Systems
Distributed hash tables (DHTs) have been adopted as a building block for large-scale distributed systems. The upshot of this success is that their robust operation is even more imp...
Pawan Prakash, Ramana Rao Kompella, Venugopalan Ra...
ICDCS
2009
IEEE
14 years 2 months ago
Modeling Probabilistic Measurement Correlations for Problem Determination in Large-Scale Distributed Systems
With the growing complexity in computer systems, it has been a real challenge to detect and diagnose problems in today’s large-scale distributed systems. Usually, the correlatio...
Jing Gao, Guofei Jiang, Haifeng Chen, Jiawei Han
IPTPS
2005
Springer
13 years 11 months ago
Practical Locality-Awareness for Large Scale Information Sharing
Tulip is an overlay for routing, searching and publish-lookup information sharing. It offers a unique combination of the advantages of both structured and unstructured overlays, t...
Ittai Abraham, Ankur Badola, Danny Bickson, Dahlia...
ICPPW
2008
IEEE
14 years 6 days ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
EUROPAR
2008
Springer
13 years 7 months ago
Fault-Tolerant Partial Replication in Large-Scale Database Systems
We investigate a decentralised approach to committing transactions in a replicated database, under partial replication. Previous protocols either reexecute transactions entirely an...
Pierre Sutra, Marc Shapiro