Sciweavers

207 search results - page 2 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
ICPP
2007
IEEE
13 years 11 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
SRDS
2008
IEEE
13 years 11 months ago
Probabilistic Failure Detection for Efficient Distributed Storage Maintenance
Distributed storage systems often use data replication to mask failures and guarantee high data availability. Node failures can be transient or permanent. While the system must ge...
Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei ...
ICDCS
2002
IEEE
13 years 10 months ago
Process Migration: A Generalized Approach Using a Virtualizing Operating System
Process migration has been used to perform specialized tasks, such as load sharing and checkpoint/restarting long running applications. Implementation typically consists of modifi...
Tom Boyd, Partha Dasgupta
GRID
2007
Springer
13 years 11 months ago
High-available grid services through the use of virtualized clustering
Grid applications comprise several components and web-services that make them highly prone to the occurrence of transient software failures and aging problems. This type of failur...
Javier Alonso, Luís Moura Silva, Artur Andr...
IPPS
2005
IEEE
13 years 11 months ago
Dynamic Delay-Fault Injection for Reconfigurable Hardware
Modern internet and telephone switches consist of numerous VLSI-circuits operating at high frequencies to handle high bandwidths. It is beyond question that such systems must cont...
Bernhard Fechner