Sciweavers

1617 search results - page 77 / 324
» Parallel Cost Analysis of Distributed Systems
Sort
View
SAC
2006
ACM
14 years 11 months ago
Combining supervised and unsupervised monitoring for fault detection in distributed computing systems
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
LATIN
2010
Springer
15 years 6 months ago
Time Complexity of Distributed Topological Self-stabilization: The Case of Graph Linearization
Topological self-stabilization is an important concept to build robust open distributed systems (such as peer-to-peer systems) where nodes can organize themselves into meaningful n...
Dominik Gall, Riko Jacob, Andréa W. Richa, ...
SIGMETRICS
2008
ACM
144views Hardware» more  SIGMETRICS 2008»
14 years 11 months ago
Co-designing the failure analysis and monitoring of large-scale systems
Large-scale distributed systems provide the backbone for numerous distributed applications and online services. These systems span over a multitude of computing nodes located at d...
Abhishek Chandra, Rohini Prinja, Sourabh Jain, Zhi...
HPCA
2005
IEEE
15 years 11 months ago
Scatter-Add in Data Parallel Architectures
Many important applications exhibit large amounts of data parallelism, and modern computer systems are designed to take advantage of it. While much of the computation in the multi...
Jung Ho Ahn, Mattan Erez, William J. Dally
HPDC
1998
IEEE
15 years 3 months ago
A Fault Detection Service for Wide Area Distributed Computations
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and co...
Paul Stelling, Ian T. Foster, Carl Kesselman, Crai...