Search Sciweavers | Sciweavers

22

ICDCS
2012
IEEE

238views Distributed And Parallel Com...» more ICDCS 2012»

Combining Partial Redundancy and Checkpointing for HPC

11 years 7 months ago

Today’s largest High Performance Computing (HPC) systems exceed one Petaﬂops (1015 ﬂoating point operations per second) and exascale systems are projected within seven years...

James Elliott, Kishor Kharbas, David Fiala, Frank ...

claim paper

Read More »

18

click to vote

ICPP
2007
IEEE

139views Distributed And Parallel Com...» more ICPP 2007»

Mercury: Combining Performance with Dependability Using Self-virtualization

13 years 11 months ago

Download ppi.fudan.edu.cn

There has recently been increasing interests in using system virtualization to improve the dependability of HPC cluster systems. However, it is not cost-free and may come with som...

Haibo Chen, Rong Chen, Fengzhe Zhang, Binyu Zang, ...

claim paper

Read More »

14

click to vote

CCGRID
2006
IEEE

131views Distributed And Parallel Com...» more CCGRID 2006»

Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation

13 years 11 months ago

Download icl.cs.utk.edu

With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...

Yuan Tang, Graham E. Fagg, Jack Dongarra

claim paper

Read More »

23

click to vote

FGCS
2002

153views more FGCS 2002»

HARNESS fault tolerant MPI design, usage and performance issues

13 years 4 months ago

Download www.netlib.org

Initial versions of MPI were designed to work efficiently on multi-processors which had very little job control and thus static process models. Subsequently forcing them to suppor...

Graham E. Fagg, Jack Dongarra

claim paper

Read More »

14

click to vote

HPDC
2000
IEEE

121views Distributed And Parallel Com...» more HPDC 2000»

Distributed Processor Allocation in Large PC Clusters

13 years 9 months ago

Download www.cs.hmc.edu

Current processor allocation techniques for highly parallel systems are based on centralized front-end based algorithms. As a result, the applied strategies are restricted to stat...

Hans-Ulrich Heiss, César A. F. De Rose, Phi...

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers