Search Sciweavers | Sciweavers

716 search results - page 73 / 144

» Tolerating Faults in Synchronization Networks

200

Voted

ICDCS
2012
IEEE

238views Distributed And Parallel Com...» more ICDCS 2012»

Combining Partial Redundancy and Checkpointing for HPC

13 years 7 months ago

Download moss.csc.ncsu.edu

Today’s largest High Performance Computing (HPC) systems exceed one Petaﬂops (1015 ﬂoating point operations per second) and exascale systems are projected within seven years...

James Elliott, Kishor Kharbas, David Fiala, Frank ...

claim paper

Read More »

141

click to vote

DSN
2004
IEEE

144views Computer Networks» more DSN 2004»

Implementing Simple Replication Protocols using CORBA Portable Interceptors and Java Serialization

15 years 9 months ago

Download homepages.laas.fr

The goal of this paper is to assess the value of simple features that are widely available in off-the-shelf CORBA and Java platforms for the implementation of faulttolerance mecha...

Taha Bennani, Laurent Blain, Ludovic Courtè...

claim paper

Read More »

178

Voted

NSDI
2010

258views Computer Networks» more NSDI 2010»

MapReduce Online

15 years 6 months ago

Download neilconway.org

MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire outp...

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....

claim paper

Read More »

159

click to vote

PODC
2009
ACM

165views Distributed and Parallel Com...» more PODC 2009»

Fast scalable deterministic consensus for crash failures

16 years 4 days ago

Download carbon.ucdenver.edu

We study communication complexity of consensus in synchronous message-passing systems with processes prone to crashes. The goal in the consensus problem is to have all the nonfaul...

Bogdan S. Chlebus, Dariusz R. Kowalski, Michal Str...

claim paper

Read More »

168

Voted

EDCC
2008
Springer

110views Applied Computing» more EDCC 2008»

A Distributed Approach to Autonomous Fault Treatment in Spread

15 years 7 months ago

Download www.ux.uis.no

This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...

Hein Meling, Joakim L. Gilje

claim paper

Read More »

« Prev « First page 73 / 144 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers