Sciweavers

SRDS
2006
IEEE

Recovering from Distributable Thread Failures with Assured Timeliness in Real-Time Distributed Systems

14 years 3 months ago
Recovering from Distributable Thread Failures with Assured Timeliness in Real-Time Distributed Systems
We consider the problem of recovering from failures of distributable threads with assured timeliness. When a node hosting a portion of a distributable thread fails, it causes orphans — i.e., segments of distributable threads that are disconnected from the thread’s root. We consider a termination model for recovering from such failures, where the orphans must be detected and aborted, resources held by them must be released and rolled back to safe states, and exceptions must be delivered to farthest, contiguous surviving thread segment from where execution can be resumed. Since distributable threads are subject to time constraints in real-time distributed systems, such recovery must be conducted with assured timeliness. Toward this, we present 1) a real-time scheduling algorithm called AUA, and 2) a distributable thread integrity protocol called TP-TR. We show that AUA and TP-TR bound the orphan cleanup and recovery time (thereby bounding thread starvation durations), maximize total...
Edward Curley, Jonathan Stephen Anderson, Binoy Ra
Added 12 Jun 2010
Updated 12 Jun 2010
Type Conference
Year 2006
Where SRDS
Authors Edward Curley, Jonathan Stephen Anderson, Binoy Ravindran, E. Douglas Jensen
Comments (0)