We consider storage in an extremely large-scale distributed computer system designed for stream processing applications. In such systems, incoming data and intermediate results ma...
Kirsten Hildrum, Fred Douglis, Joel L. Wolf, Phili...
Performing experimental evaluation of fault tolerant distributed systems is a complex and tedious task, and automating as much as possible of the execution and evaluation of exper...
—Fast track is a software speculation system that enables unsafe optimization of sequential code. It speculatively runs optimized code to improve performance and then checks the ...
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
—The increasing demand for resources of the high performance computing systems has led to new forms of collaboration of distributed systems such as interoperable grid systems tha...