Sciweavers

39 search results - page 1 / 8
» Combining supervised and unsupervised monitoring for fault d...
Sort
View
SAC
2006
ACM
13 years 4 months ago
Combining supervised and unsupervised monitoring for fault detection in distributed computing systems
Fast and accurate fault detection is becoming an essential component of management software for mission critical systems. A good fault detector makes possible to initiate repair a...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
ICPR
2006
IEEE
13 years 10 months ago
Fault Detection in Distributed Systems by Representative Subspace Mapping
The high dimensionality of system observation, together with the frequent changes of system normal behavior resulting from workload variations, makes fault detection very difficu...
Haifeng Chen, Guofei Jiang, Kenji Yoshihira
WOSP
2004
ACM
13 years 10 months ago
Computing the performability of layered distributed systems with a management architecture
This paper analyzes the performability of client-server applications that use a separate fault management architecture for monitoring and controlling of the status of the applicat...
Olivia Das, C. Murray Woodside
HPDC
1998
IEEE
13 years 8 months ago
A Fault Detection Service for Wide Area Distributed Computations
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and co...
Paul Stelling, Ian T. Foster, Carl Kesselman, Crai...
ICDCS
2003
IEEE
13 years 9 months ago
Software Fault Tolerance of Distributed Programs Using Computation Slicing
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, esp...
Neeraj Mittal, Vijay K. Garg