Sciweavers

442 search results - page 1 / 89
» Fault Tolerant Wide-Area Parallel Computing
Sort
View
IPPS
2000
IEEE
15 years 1 months ago
Fault Tolerant Wide-Area Parallel Computing
Executing parallel applications across distributed networks introduces the problem of fault tolerance. A viable solution for fault tolerance must keep overhead manageable and not c...
Jon B. Weissman
CLUSTER
2003
IEEE
15 years 2 months ago
Wide Area Cluster Monitoring with Ganglia
In this paper, we present a structure for monitoring a large set of computational clusters. We illustrate methods for scaling a monitor network comprised of many clusters while ke...
Federico D. Sacerdoti, Mason J. Katz, Matthew L. M...
SRDS
1996
IEEE
15 years 1 months ago
Exploiting Data-Flow for Fault-Tolerance in a Wide-Area Parallel System
Wide-area parallel processing systems will soon be available to researchers to solve a range of problems. In these systems, it is certain that host failures and other faults will ...
Anh Nguyen-Tuong, Andrew S. Grimshaw, Mark Hyett
HPDC
1998
IEEE
15 years 1 months ago
A Fault Detection Service for Wide Area Distributed Computations
The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and co...
Paul Stelling, Ian T. Foster, Carl Kesselman, Crai...
COMPUTER
1999
95views more  COMPUTER 1999»
14 years 9 months ago
Wide-Area Computing: Resource Sharing on a Large Scale
abstract over a complex set of resources and provide a high-level way to share and manage them over the network. To be effective, such a system must address the challenges posed by...
Andrew S. Grimshaw, Adam Ferrari, Frederick Knabe,...