A distributed load-based failure recovery mechanism for advance reservation environments

14 years 3 months ago

Download kbs.cs.tu-berlin.de

— Resource reservations in advance are a mature concept for the allocation of various resources, particularly in grid environments. Common grid toolkits support advance reservations and assign jobs to resources at admission time. In such a distributed environment, it is necessary to develop carefully tailored failure recovery mechanisms that provide seamless transparent migration of jobs from one resource to another. As the migration of running jobs is diﬃcult, an important issue in advance reservation, i.e., planning based, management infrastructures is to determine the duration of a failure in order to remap jobs that are already allocated to a currently failed resource but not yet active. As shown in previous work, underestimations of the failure duration and as a consequence the remapping of too few jobs results in an increased amount of job terminations. In order to overcome this drawback, in this paper we propose a load-based computation of the jobs to be remapped. A centrali...

Lars-Olof Burchard, Barry Linnert, Joerg Schneider

Real-time Traffic

Advance Reservation | CCGRID 2005 | Cluster Computing | Failure Duration | Toolkits Support Advance |

claim paper

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	CCGRID
Authors	Lars-Olof Burchard, Barry Linnert, Joerg Schneider

Sciweavers

A distributed load-based failure recovery mechanism for advance reservation environments

Advance Reservation | CCGRID 2005 | Cluster Computing | Failure Duration | Toolkits Support Advance |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers