Sciweavers

MIDDLEWARE
2004
Springer
13 years 10 months ago
Data pipelines: enabling large scale multi-protocol data transfers
Collaborating users need to move terabytes of data among their sites, often involving multiple protocols. This process is very fragile and involves considerable human involvement ...
Tevfik Kosar, George Kola, Miron Livny
IPPS
2005
IEEE
13 years 10 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
PRDC
2007
IEEE
13 years 11 months ago
PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers
Several recent studies identify the memory system as the most frequent source of hardware failures in commercial servers. Techniques to protect the memory system from failures mus...
Jangwoo Kim, Jared C. Smolens, Babak Falsafi, Jame...
SOSP
2009
ACM
14 years 1 months ago
Tolerating hardware device failures in software
Hardware devices can fail, but many drivers assume they do not. When confronted with real devices that misbehave, these assumptions can lead to driver or system failures. While ma...
Asim Kadav, Matthew J. Renzelmann, Michael M. Swif...