Abstract. Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-...
Task graph scheduling has been found effective in performance prediction and optimization of parallel applications. A number of static scheduling algorithms have been proposed for...
In this paper, we argue for the power of providing a common set of OS services to wide area applications, including mechanisms for resource discovery, a global namespace, remote p...
Amin Vahdat, Thomas E. Anderson, Michael Dahlin, E...
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high performance for this problem is di cult on distributed memory machin...
This paper aims to improve locality of references by suitably choosing array layouts. We use a new definition of spatial reuse vectors that takes into account memory layout of arra...
Mahmut T. Kandemir, Alok N. Choudhary, J. Ramanuja...