The size of supercomputers in numbers of processors is growing exponentially. Today’s largest supercomputers have upwards of a hundred thousand processors and tomorrow’s may ha...
Mustafa M. Tikir, Michael Laurenzano, Laura Carrin...
Fault tolerant distributed protocols typically utilize a homogeneous fault model, either fail-crash or fail-Byzantine, where all processors are assumed to fail in the same manner....
Multi-agent systems designed to work collaboratively with groups of people typically require private information that people will entrust to them only if they have assurance that ...
Rachel Greenstadt, Barbara J. Grosz, Michael D. Sm...
Monitoring is a task of collecting measurements that reflect the state of a system. Administration is a collection of tasks for control and manipulation of computer systems. Monito...
Because of increasing hardware and software complexity, the running time of many computational science applications is now more than the mean-time-to-failure of highpeformance com...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...