Deploying Grid technologies by distributing an application over several machines has been widely used for scientific simulations, which have large requirements for computational r...
Researchers and practitioners in the area of parallel and distributed computing have been lacking a portable, flexible and robust distributed instrumentation system. We present th...
Aleksandar M. Bakic, Matt W. Mutka, Diane T. Rover
Memory dependence prediction allows out-of-order issue processors to achieve high degrees of instruction level parallelism by issuing load instructions at the earliest time withou...
Loops are the main time consuming part of programs based on floating point computations. The performance of the loops is limited either by recurrences in the computation or by the...
Improving memory performance at software level is more effective in reducing the rapidly expanding gap between processor and memory performance. Loop transformations (e.g. loop un...
Surendra Byna, Xian-He Sun, William Gropp, Rajeev ...