In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM® SP™ systems, a self-defining interv...
Ching-Farn Eric Wu, Anthony Bolmarcich, Marc Snir,...
The use of a cluster for distributed performance analysis of parallel trace data is discussed. We propose an analysis architecture that uses multiple cluster nodes as a server to ...
Abstract—The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce pr...
—Communication traces are integral to performance modeling and analysis of parallel programs. However, execution on a large number of nodes results in a large trace volume that i...
This paper develops a scalable online optimization framework for the autonomic performance management of distributed computing systems operating in a dynamic environment to satisf...