Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Motivation: Database search programs such as FASTA, BLAST or a rigorous Smith–Waterman algorithm produce lists of database entries, which are assumed to be related to the query....
This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicl...
We present the Stack Trace Analysis Tool (STAT) to aid in debugging extreme-scale applications. STAT can reduce problem exploration spaces from thousands of processes to a few by ...
Dorian C. Arnold, Dong H. Ahn, Bronis R. de Supins...
Cooperation among autonomous agents has been discussed in the DAI community for several years. Papers about cooperation 6,45 , negotiation 33 , distributed planning 5 , and coalit...