The thesis of this research is that the task of exposing the parallelism in a given application should be left to the algorithm designer, who has intimate knowledge of the applica...
We present an optimized implementation of the linear scan register allocation algorithm for Sun Microsystems’ Java HotSpotTM client compiler. Linear scan register allocation is ...
In this paper, we describe a set of compiler analyses and an implementation that automatically map a sequential and un-annotated C program into a pipelined implementation, targete...
In conventional superscalar microarchitectures with partitioned integer and floating-point resources, all floating-point resources are idle during execution of integer programs....
S. Subramanya Sastry, Subbarao Palacharla, James E...
Graphics processing units (GPUs) provide both memory bandwidth and arithmetic performance far greater than that available on CPUs but, because of their Single-Instruction-Multiple...