Sciweavers

19 search results - page 2 / 4
» A proactive fault-detection mechanism in large-scale cluster...
Sort
View
CLUSTER
2002
IEEE
13 years 10 months ago
Algorithmic Mechanism Design for Load Balancing in Distributed Systems
Computational Grids are large scale computing system composed of geographically distributed resources (computers, storage etc.) owned by self interested agents or organizations. T...
Daniel Grosu, Anthony T. Chronopoulos
ICS
2007
Tsinghua U.
13 years 11 months ago
Proactive fault tolerance for HPC with Xen virtualization
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
IWCC
1999
IEEE
13 years 9 months ago
Nomad: A Scalable Operating System for Clusters of Uni and Multiprocessors
The recent improvements in workstation and interconnection network performance have popularized the clusters of off-the-shelf workstations. However, the usefulness of these cluste...
Eduardo Pinheiro, Ricardo Bianchini
ICPP
2009
IEEE
13 years 12 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
IPPS
2006
IEEE
13 years 11 months ago
A distributed paging RAM grid system for wide-area memory sharing
Memory-intensive applications often suffer from the poor performance of disk swapping when memory is inadequate. Remote memory sharing schemes, which provide a remote memory that ...
Rui Chu, Nong Xiao, Yongzhen Zhuang, Yunhao Liu, X...