Sciweavers

DAC
2011
ACM

DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips

12 years 3 months ago
DRAIN: distributed recovery architecture for inaccessible nodes in multi-core chips
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, enabling the design of highly integrated chips with many cores and a complex interconnect fabric, often a network on chip (NoC). Particularly problematic is the case when the accumulation of permanent hardware faults leads to disconnected cores in the system. In order to maintain correct system operation, it is necessary to salvage the data from these isolated nodes. In this work, we introduce a recovery mechanism targeting precisely this issue: DRAIN (Distributed Recovery Architecture for Inaccessible Nodes) provides system-level recovery from permanent failures. When an error disconnects a node from the network, DRAIN uses emergency links to transfer architectural state and cached data from disconnected nodes to nearby connected caches. DRAIN incurs zero performance penalty during normal operation, and is comp...
Andrew DeOrio, Konstantinos Aisopos, Valeria Berta
Added 18 Dec 2011
Updated 18 Dec 2011
Type Journal
Year 2011
Where DAC
Authors Andrew DeOrio, Konstantinos Aisopos, Valeria Bertacco, Li-Shiuan Peh
Comments (0)