Sciweavers

SC
2005
ACM

Fault Tolerance Techniques for the Merrimac Streaming Supercomputer

13 years 10 months ago
Fault Tolerance Techniques for the Merrimac Streaming Supercomputer
As device scales shrink, higher transistor counts are available while soft-errors, even in logic, become a major concern. A new class of architectures, such as Merrimac and the IBM Cell, take advantage of the higher transistor count by exposing control, communication, and a large number of functional-units at the architectural level, thus achieving high performance and efficiency. This paper explores soft-error fault tolerance in the context of these computeintensive architectures, which differ significantly from their control-intensive CPU counterparts. The main goal of the proposed schemes for Merrimac is to conserve the critical and costly off-chip bandwidth and on-chip storage resources, while maintaining high peak and sustained performance. We achieve this by allowing for reconfigurability and relying on programmer input. The processor is either run at full peak performance employing software fault-tolerance methods, or reduced performance with hardware redundancy. We present...
Mattan Erez, Nuwan Jayasena, Timothy J. Knight, Wi
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where SC
Authors Mattan Erez, Nuwan Jayasena, Timothy J. Knight, William J. Dally
Comments (0)