We present new methods to extend data reliability of disks in RAID systems for applications like long term data archival. The proposed solutions extend existing algorithms to detec...
Modern scientific experiments can generate large amounts of data, which may be replicated and distributed across multiple resources to improve application performance and fault to...
The Grid provides infrastructure that allows an arbitrary application to be executed on a range of different computational resources. When input files are very large, or when faul...
Due to modern technology trends such as decreasing feature sizes and lower voltage levels, fault tolerance is becoming increasingly important in computing systems. Shared memory i...
In order to be able to tolerate the effects of faults, we must first detect the symptoms of faults, i.e. the errors. This paper evaluates the error detection properties of an erro...