Sciweavers

25 search results - page 3 / 5
» Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Para...
Sort
View
IPPS
2007
IEEE
14 years 2 days ago
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
ICDCS
1999
IEEE
13 years 10 months ago
Fault Tolerant Video on Demand Services
This paper describes a highly available distributedvideo on demand (VoD) service which is inherently fault tolerant. The VoD service is provided by multiple servers that reside at...
Tal Anker, Danny Dolev, Idit Keidar
ICS
2007
Tsinghua U.
13 years 12 months ago
Proactive fault tolerance for HPC with Xen virtualization
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
IPPS
1999
IEEE
13 years 10 months ago
High-Performance Knowledge Extraction from Data on PC-Based Networks of Workstations
The automatic construction of classi ers programs able to correctly classify data collected from the real world is one of the major problems in pattern recognition and in a wide ar...
Cosimo Anglano, Attilio Giordana, Giuseppe Lo Bell...
CLUSTER
2002
IEEE
13 years 5 months ago
Condor-G: A Computation Management Agent for Multi-Institutional Grids
In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated ...
James Frey, Todd Tannenbaum, Miron Livny, Ian T. F...