We present design and implementation details as well as performance results for two new parallel checkpointing libraries developed by us for parallel MPI applications. The first o...
Checkpointing of parallel applications can be used as the core technology to provide process migration. Both, checkpointing and migration, are an important issue for parallel appl...
With the advent of Grid computing, more and more highend computational resources become available for use to a scientist. While this opens up new avenues for scientific research,...
We describe a new approach to real time learning of unknown functions based on an interpolating wavelet estimation. We choose a subfamily of a wavelet basis relying on nested hiera...
We present SGuard, a new fault-tolerance technique for distributed stream processing engines (SPEs) running in clusters of commodity servers. SGuard is less disruptive to normal s...
YongChul Kwon, Magdalena Balazinska, Albert G. Gre...