Sciweavers

ASPLOS
2008
ACM

Adapting to intermittent faults in multicore systems

13 years 6 months ago
Adapting to intermittent faults in multicore systems
Future multicore processors will be more susceptible to a variety of hardware failures. In particular, intermittent faults, caused in part by manufacturing, thermal, and voltage variations, can cause bursts of frequent faults that last from several cycles to several seconds or more. Due to practical limitations of circuit techniques, costeffective reliability will likely require the ability to temporarily suspend execution on a core during periods of intermittent faults. We investigate three of the most obvious techniques for adapting to the dynamically changing resource availability caused by intermittent faults, and demonstrate their different system-level implications. We show that system software reconfiguration has very high overhead, that temporarily pausing execution on a faulty core can lead to cascading livelock, and that using spare cores has high faultfree cost. To remedy these and other drawbacks of the three baseline techniques, we propose using a thin hardware/firmware l...
Philip M. Wells, Koushik Chakraborty, Gurindar S.
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where ASPLOS
Authors Philip M. Wells, Koushik Chakraborty, Gurindar S. Sohi
Comments (0)