— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
Job scheduling typically focuses on the CPU with little work existing to include I/O or memory. Time-shared execution provides the chance to hide I/O and long-communication latenc...
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
— Many application-specific wireless sensor network (WSN) systems require small size and low power features due to their limited resources, and their use in distributed, wireles...
Chung-Ching Shen, Roni Kupershtok, Bo Yang, Felice...
In cluster-based storage systems, the metadata server cluster must be able to adaptively distribute responsibility for metadata to maintain high system performance and long-term l...