Supercomputers are prone to frequent faults that adversely affect their performance, reliability and functionality. System logs collected on these systems are a valuable resource o...
As the size of available datasets in various domains is growing rapidly, there is an increasing need for scaling data mining implementations. Coupled with the current trends in co...
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Traditional performance analysis techniques are performed after a parallel program has completed. In this paper, we describe an online method for continuously monitoring the perfor...
Isaac Dooley, Chee Wai Lee, Laxmikant V. Kal&eacut...
In many-core architectures, memory blocks are commonly assigned to the banks of a NUCA cache by following a physical mapping. This mapping assigns blocks to cache banks in a round-...
Alberto Ros, Marcelo Cintra, Manuel E. Acacio, Jos...