Sciweavers

647 search results - page 1 / 130
» Simulating Failures on Large-Scale Systems
Sort
View
ICPPW
2008
IEEE
13 years 10 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
CCGRID
2006
IEEE
13 years 10 months ago
A Failure-Aware Scheduling Strategy in Large-Scale Cluster System
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
IPPS
2005
IEEE
13 years 9 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
DBKDA
2010
IEEE
127views Database» more  DBKDA 2010»
13 years 2 months ago
Failure-Tolerant Transaction Routing at Large Scale
—Emerging Web2.0 applications such as virtual worlds or social networking websites strongly differ from usual OLTP applications. First, the transactions are encapsulated in an AP...
Idrissa Sarr, Hubert Naacke, Stéphane Gan&c...
MASCOTS
2001
13 years 5 months ago
Large-Scale Simulation of Replica Placement Algorithms for a Serverless Distributed File System
Farsite is a scalable, distributed file system that logically functions as a centralized file server but that is physically implemented on a set of client desktop computers. Farsi...
John R. Douceur, Roger Wattenhofer