Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms f...
Abstract. Given a symmetric positive definite matrix A, we compute a structured approximate Cholesky factorization A RT R up to any desired accuracy, where R is an upper triangula...
While technology trends have ushered in the age of chip multiprocessors (CMP) and enabled designers to place an increasing number of cores on chip, a fundamental question is what ...
Divya Gulati, Changkyu Kim, Simha Sethumadhavan, S...