Sciweavers

1166 search results - page 1 / 234
» Crash Management for Distributed Parallel Systems
Sort
View
GI
2004
Springer
13 years 10 months ago
Crash Management for Distributed Parallel Systems
: With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organ...
Jan Haase, Frank Eschmann
HCW
1998
IEEE
13 years 9 months ago
CCS Resource Management in Networked HPC Systems
CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administr...
Axel Keller, Alexander Reinefeld
IPPS
1999
IEEE
13 years 9 months ago
Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems
In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software ...
Angkul Kongmunvattana, Nian-Feng Tzeng
GCC
2003
Springer
13 years 10 months ago
Network Storage Management in Data Grid Environment
This paper presents the Network Storage Manager (NSM) developed in the Distributed Computing Laboratory at Jackson State University. NSM is designed as a Java-based, high-performan...
Shaofeng Yang, Zeyad Ali, Houssain Kettani, Vinti ...