Sciweavers

32 search results - page 6 / 7
» Performance Implications of Failures in Large-Scale Cluster ...
Sort
View
ICDCS
2012
IEEE
11 years 7 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
ICDCS
1997
IEEE
13 years 9 months ago
Supporting Dynamic Space-sharing on Clusters of Non-dedicated Workstations
Clusters of workstations are increasingly being viewed as a cost-e ective alternative to parallel supercomputers. However, resource management and scheduling on workstations clust...
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E...
STOC
2003
ACM
90views Algorithms» more  STOC 2003»
14 years 5 months ago
Work-competitive scheduling for cooperative computing with dynamic groups
The problem of cooperatively performing a set of t tasks in a decentralized setting where the computing medium is subject to failures is one of the fundamental problems in distrib...
Chryssis Georgiou, Alexander Russell, Alexander A....
DICS
2006
13 years 9 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules: A Programmer's Perspective
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch
ICNP
2003
IEEE
13 years 10 months ago
Data Dissemination with Ring-Based Index for Wireless Sensor Networks
In current sensor networks, sensor nodes are capable of not only measuring real world phenomena, but also storing, processing and transferring these measurements. Many data dissem...
Wensheng Zhang, Guohong Cao, Thomas F. La Porta