— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...
Temperature hot-spots have been known to cause severe reliability problems and to significantly increase leakage power. The register file has been previously shown to exhibit the ...
While uniprocessor garbage collection is relatively well understood, experience with collectors for large multiprocessor servers is limited and it is unknown which techniques best ...
C. Richard Attanasio, David F. Bacon, Anthony Cocc...
Programmers of message-passing codes for clusters of workstations face a daunting challenge in understanding the performance bottlenecks of their applications. This is largely due...
We have taken a NIST molecular dynamics simulation program (md3), which was configured as a single sequential process running on a CRAY C90 vector supercomputer, and parallelized ...