Sciweavers

916 search results - page 30 / 184
» Distributed Construction of a Fault-Tolerant Network from a ...
Sort
View
HPDC
2009
IEEE
15 years 8 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
ICDCS
2008
IEEE
15 years 8 months ago
stdchk: A Checkpoint Storage System for Desktop Grid Computing
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
MOBIDE
2010
ACM
15 years 2 months ago
Minimum-hot-spot query trees for wireless sensor networks
We propose a distributed algorithm to construct a balanced communication tree that serves in gathering data from the network nodes to a sink. Our algorithm constructs a near-optim...
Georgios Chatzimilioudis, Demetrios Zeinalipour-Ya...
ACSAC
1999
IEEE
15 years 6 months ago
Adding Availability to Log Services of Untrusted Machines
Uncorrupted log files are the critical system component for computer forensics in case of intrusion and for real time system monitoring and auditing. Protection from tampering wit...
Arianna Arona, Danilo Bruschi, Emilia Rosti
WWW
2003
ACM
15 years 7 months ago
WS-Membership - Failure Management in a Web-Services World
An important factor in the successful deployment of federated web-services-based business activities will be the ability to guarantee reliable distributed operation and execution....
Werner Vogels, Christopher Ré