Sciweavers

1030 search results - page 179 / 206
» Stateful Detection in High Throughput Distributed Systems
Sort
View
IPPS
2005
IEEE
15 years 3 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
CLUSTER
2003
IEEE
15 years 2 months ago
A Performance Monitor Based on Virtual Global Time for Clusters of PCs
Debugging the performance of parallel and distributed systems remains a difficult task despite the widespread use of middleware packages for automatic distribution, communication...
Michela Taufer, Thomas Stricker
COOPIS
2002
IEEE
15 years 2 months ago
Composing and Deploying Grid Middleware Web Services Using Model Driven Architecture
Rapid advances in networking, hardware, and middleware technologies are facilitating the development and deployment of complex grid applications, such as large-scale distributed co...
Aniruddha S. Gokhale, Balachandran Natarajan
HIPC
2007
Springer
15 years 3 months ago
A Scalable Asynchronous Replication-Based Strategy for Fault Tolerant MPI Applications
As computational clusters increase in size, their mean-time-to-failure reduces. Typically checkpointing is used to minimize the loss of computation. Most checkpointing techniques, ...
John Paul Walters, Vipin Chaudhary
CLUSTER
2007
IEEE
15 years 4 months ago
Anomaly localization in large-scale clusters
— A critical problem facing by managing large-scale clusters is to identify the location of problems in a system in case of unusual events. As the scale of high performance compu...
Ziming Zheng, Yawei Li, Zhiling Lan