Sciweavers

DATE
2005
IEEE

Increasing Register File Immunity to Transient Errors

13 years 10 months ago
Increasing Register File Immunity to Transient Errors
Transient errors are one of the major reasons for system downtime in many systems. While prior research has mainly focused on the impact of transient errors on datapath, caches and main memories, the register file has largely been neglected. Since the register file is accessed very frequently, the probability of transient errors is high. In addition, errors in it can quickly spread to different parts of the system, and cause application crash or silent data corruption. This paper addresses the reliability of register files in superscalar processors. Particularly, we propose to duplicate actively used physical registers in unused physical registers. The rationale behind this idea is that if the protection mechanism (parity or ECC) used for the primary copy indicates an error, the duplicate can provide the data as long as it is not corrupted. We implement two types of strategies based on this register duplication idea. In the “conservative strategy,” we limit ourselves with the give...
Gokhan Memik, Mahmut T. Kandemir, Ozcan Ozturk
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where DATE
Authors Gokhan Memik, Mahmut T. Kandemir, Ozcan Ozturk
Comments (0)