Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration change...
Abstract. Performance analysis for terascale computing requires a combination of new concepts including distribution, on-line processing and automation. As a foundation for tools r...
1 Since I/O-intensive tasks running on a heterogeneous cluster need a highly effective usage of global I/O resources, previous CPUor memory-centric load balancing schemes suffer ...
Xiao Qin, Hong Jiang, Yifeng Zhu, David R. Swanson
In a shared server infrastructure, a scheduler controls how quantities of resources are shared over time in a fair manner across multiple, competing consumers. It should support w...
We consider the impact of different communication architectures on the performability (performance + availability) of cluster-based servers. In particular, we use a combination of ...