Sciweavers

CCGRID
2008
IEEE
13 years 4 months ago
Fault Tolerance and Recovery of Scientific Workflows on Computational Grids
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Gopi Kandaswamy, Anirban Mandal, Daniel A. Reed
CCGRID
2008
IEEE
13 years 6 months ago
A Decentralized and Cooperative Workflow Scheduling Algorithm
In the current approaches to workflow scheduling, there is no cooperation between the distributed workflow brokers and as a result, the problem of conflicting schedules occur. To o...
Rajiv Ranjan, Mustafizur Rahman 0003, Rajkumar Buy...
CCGRID
2008
IEEE
13 years 6 months ago
Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems
Abstract--The current trend in high performance computing is to aggregate ever larger numbers of processing and interconnection elements in order to achieve desired levels of compu...
Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile...
CCGRID
2008
IEEE
13 years 6 months ago
Benefits of Job Exchange between Autonomous Sites in Decentralized Computational Grids
This paper examines the job exchange between parallel compute sites in a decentralized Grid scenario. Here, the local scheduling system remains untouched and continues normal oper...
Christian Grimme, Joachim Lepping, Alexander Papas...
CCGRID
2008
IEEE
13 years 6 months ago
A Probabilistic Model to Analyse Workflow Performance on Production Grids
Production grids are complex and highly variable systems whose behavior is not well understood and difficult to anticipate. The goal of this study is to estimate the impact of the ...
Tristan Glatard, Johan Montagnat, Xavier Pennec
CCGRID
2008
IEEE
13 years 6 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
CCGRID
2008
IEEE
13 years 6 months ago
AMP: An Affinity-Based Metadata Prefetching Scheme in Large-Scale Distributed Storage Systems
Prefetching is an effective technique for improving file access performance, which can reduce access latency for I/O systems. In distributed storage system, prefetching for metadat...
Lin Lin, Xueming Li, Hong Jiang, Yifeng Zhu, Lei T...
CCGRID
2008
IEEE
13 years 6 months ago
Orchestrating Data-Centric Workflows
When orchestrating data-centric workflows as are commonly found in the sciences, centralised servers can become a bottleneck to the performance of a workflow; output from service i...
Adam Barker, Jon B. Weissman, Jano I. van Hemert
CCGRID
2008
IEEE
13 years 6 months ago
View-Based Collective I/O for MPI-IO
This paper presents the design and implementation of a new file system independent collective I/O optimization based on file views: view-based collective I/O. View-based collective...
Francisco Javier García Blas, Florin Isaila...