Sciweavers

442 search results - page 1 / 89
» Fault Tolerant Wide-Area Parallel Computing
Sort
View
IPPS
2000
IEEE
13 years 8 months ago
Fault Tolerant Wide-Area Parallel Computing
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
Jon B. Weissman
CLUSTER
2003
IEEE
13 years 10 months ago
Wide Area Cluster Monitoring with Ganglia
In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while ke...
Federico D. Sacerdoti, Mason J. Katz, Matthew L. M...
SRDS
1996
IEEE
13 years 8 months ago
Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will ...
Anh Nguyen-Tuong, Andrew S. Grimshaw, Mark Hyett
HPDC
1998
IEEE
13 years 9 months ago
A Fault Detection Service for Wide Area Distributed Computations
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and co...
Paul Stelling, Ian T. Foster, Carl Kesselman, Crai...
COMPUTER
1999
95views more  COMPUTER 1999»
13 years 4 months ago
Wide-Area Computing: Resource Sharing on a Large Scale
abstract over a complex set of resources and provide a high-level way to share and manage them over the network. To be effective, such a system must address the challenges posed by...
Andrew S. Grimshaw, Adam Ferrari, Frederick Knabe,...