Towards scalable reliability frameworks for error prone CMPs

11 years 26 days ago
Towards scalable reliability frameworks for error prone CMPs
As technology scales and the energy of computation continually approaches thermal equilibrium [1,2], parameter variations and noise levels will lead to larger error rates at various levels of the computation stack. The error rates would be especially high for post-CMOS and nanoelectronic systems as well as for probabilistic [3] and stochastic architectures [4]. N-modular redundancy (NMR) at the core-level has been proposed as a way to attain system reliability goals for multicore architectures. While core-level DMR and TMR have been shown to be effective when errors are rare, a large amount of core-level redundancy will be required for attaining system reliability goals in face of high error rates. This makes voting latency and bandwidth significant performance bottlenecks for such systems. In this paper, we present a scalable NMR framework for error prone chip multiprocessors(CMPs). The framework supports in-network fault tolerance where voting logic is integrated into routers to a...
Joseph Sloan, Rakesh Kumar
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Authors Joseph Sloan, Rakesh Kumar
Comments (0)