— Extreme scaling practices in silicon technology are quickly leading to integrated circuit components with limited reliability, where phenomena such as early-transistor failures...
Andrea Pellegrini, Kypros Constantinides, Dan Zhan...
High availability is an increasingly important requirement for enterprise systems, often valued more than performance. Systems designed for high availability typically use redunda...
Nidhi Aggarwal, Parthasarathy Ranganathan, Norman ...
The Solaris 10 Operating System includes a number of new features for predictive self-healing. One such feature is the ability of the Fault Management software to diagnose memory ...
Dong Tang, Peter Carruthers, Zuheir Totari, Michae...
Failure analysis (FA) and diagnosis of memory cores plays a key role in system-on-chip (SOC) product development and yield ramp-up. Conventional FA based on bitmaps and the experi...
We propose a simple and practical probabilistic comparison-based model, employing multiple incomplete test concepts, for handling fault location in distributed systems using a Bay...
Yu Lo Cyrus Chang, Leslie C. Lander, Horng-Shing L...