Sciweavers

372 search results - page 61 / 75
» GPU clusters for high-performance computing
Sort
View
HIPC
2007
Springer
15 years 3 months ago
A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
John Paul Walters, Vipin Chaudhary
CLUSTER
2002
IEEE
14 years 9 months ago
Online Prediction of the Running Time of Tasks
Abstract. We describe and evaluate the Running Time Advisor (RTA), a system that can predict the running time of a compute-bound task on a typical shared, unreserved commodity host...
Peter A. Dinda
PPOPP
2012
ACM
13 years 5 months ago
PARRAY: a unifying array representation for heterogeneous parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Yifeng Chen, Xiang Cui, Hong Mei
CLUSTER
2004
IEEE
15 years 1 months ago
Improved message logging versus improved coordinated checkpointing for fault tolerant MPI
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
ISMS
2004
Springer
15 years 2 months ago
An Interactive Parallel Multigrid FEM Simulator
Physically based modeling of deformable objects such as cloth or human tissue has grown to be very important for virtual simulations. However, interactive simulation of these nonl...
Xunlei Wu, Tolga Goktekin, Frank Tendick