Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real a...
Exploiting instruction-level parallelism (ILP) is extremely important for achieving high performance in application specific instruction set processors (ASIPs) and embedded process...
Ramaswamy Govindarajan, Erik R. Altman, Guang R. G...
Loop distribution is an integral part of transforming a sequential program into a parallel one. It is used extensively in parallelization,vectorization, and memory management. For...
An effective way to reduce the number of aborts in software transactional memory (STM) is to keep multiple versions of transactional objects. In this paper, we study inherent prop...
Modulo scheduling is an e cient technique for exploiting instruction level parallelism in a variety of loops, resulting in high performance code but increased register requirement...
Alexandre E. Eichenberger, Edward S. Davidson, San...