Traditional compilers perform their code generation tasks based on a fixed, pre-determined instruction set. This paper describes the implementation of a compiler that determines ...
The increasing availability of multi-core and multiprocessor architectures provides new opportunities for improving the performance of many computer simulations. Markov Chain Mont...
Jonathan M. R. Byrd, Stephen A. Jarvis, A. H. Bhal...
In this paper, we present nearly optimal algorithms for broadcast on a d-dimensional nn:::n torus that supports all-port communication and wormhole routing. Let Tn denote the numb...
This paper presents results on a new approach to partitioning a modulo-scheduled loop for distributed execution on parallel clusters of functional units organized as a VLIW machin...
Marcio Merino Fernandes, Josep Llosa, Nigel P. Top...
Abstract. Shared last level cache is crucial to performance. However, multithread program model incurs serious contention in shared cache. In this paper, to reduce average cache ac...