Search Sciweavers | Sciweavers

21 search results - page 2 / 5

» Job-Site Level Fault Tolerance for Cluster and Grid environm...

click to vote

CLUSTER
2003
IEEE

165views Distributed And Parallel Com...» more CLUSTER 2003»

Coordinated Checkpoint versus Message Log for Fault Tolerant MPI

13 years 11 months ago

Download www.cs.utk.edu

— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...

Aurelien Bouteiller, Pierre Lemarinier, Gér...

claim paper

Read More »

click to vote

ESCIENCE
2006
IEEE

133views Distributed And Parallel Com...» more ESCIENCE 2006»

Practical Fault-Tolerant Framework for eScience Infrastructure

13 years 11 months ago

Download dcslab.snu.ac.kr

Many areas of science currently use computing resources as a important part of their research, and many research groups adopt cluster architecture to use them eﬃciently and mana...

Hyuck Han, Jai Wug Kim, Jongpil Lee, Youngjin Yu, ...

claim paper

Read More »

click to vote

CCGRID
2008
IEEE

191views Distributed And Parallel Com...» more CCGRID 2008»

An Autonomic Workflow Management System for Global Grids

14 years 4 days ago

Download www.gridbus.org

Workflow Management System is generally utilized to define, manage and execute workflow applications on Grid resources. However, the increasing scale complexity, heterogeneity and...

Mustafizur Rahman 0003, Rajkumar Buyya

claim paper

Read More »

click to vote

GRID
2003
Springer

131views Distributed And Parallel Com...» more GRID 2003»

Faults in Grids: Why are they so bad and What can be done about it?

13 years 11 months ago

Download www.dsc.ufcg.edu.br

Computational Grids have the potential to become the main execution platform for high performance and distributed applications. However, such systems are extremely complex and pro...

Raissa Medeiros, Walfredo Cirne, Francisco Vilar B...

claim paper

Read More »

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

13 years 11 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

« Prev « First page 2 / 5 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers