Sciweavers

SRDS
1997
IEEE

Fault Detection Using Hints from the Socket Layer

13 years 8 months ago
Fault Detection Using Hints from the Socket Layer
This paper describes a fault detection mechanism that uses the error codes returned by the stream sockets to locate process failures. Since these errors are generated automatically when there is communication with a failed process, the mechanism does not incur in any failure-free overheads. However, for some types of faults, detection can only be attained if the surviving processes use certain communication operations. To assess the coverage and latency of the proposed mechanism, faults were injected during the execution of parallel applications. Our results show that in most cases, faultscould be found using only the errors from the socket layer. Depending on the type of fault that was injected, detection occurred in an interval ranging from a few milliseconds to less than 9 minutes.
Nuno Neves, W. Kent Fuchs
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1997
Where SRDS
Authors Nuno Neves, W. Kent Fuchs
Comments (0)