Sciweavers

1024 search results - page 125 / 205
» Fault Tolerance in Decentralized Systems
Sort
View
IOLTS
2008
IEEE
117views Hardware» more  IOLTS 2008»
15 years 4 months ago
Verification and Analysis of Self-Checking Properties through ATPG
Present and future semiconductor technologies are characterized by increasing parameters variations as well as an increasing susceptibility to external disturbances. Transient err...
Marc Hunger, Sybille Hellebrand
ISPA
2004
Springer
15 years 3 months ago
Highly Reliable Linux HPC Clusters: Self-Awareness Approach
Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Chokchai Leangsuksun, Tong Liu, Yudan Liu, Stephen...
HPDC
2000
IEEE
15 years 2 months ago
Robust Resource Management for Metacomputers
In this paper we present a robust software infrastructure for metacomputing. The system is intended to be used by others as a building block for large and powerful computational g...
Jörn Gehring, Achim Streit
ICPADS
2006
IEEE
15 years 3 months ago
Fast Convergence in Self-Stabilizing Wireless Networks
The advent of large scale multi-hop wireless networks highlights problems of fault tolerance and scale in distributed system, motivating designs that autonomously recover from tra...
Nathalie Mitton, Eric Fleury, Isabelle Guér...
CCGRID
2006
IEEE
15 years 1 months ago
IPMI-based Efficient Notification Framework for Large Scale Cluster Computing
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
Chokchai Leangsuksun, Tirumala Rao, Anand Tikoteka...