We introduce techniques to support efficient non-atomic execution of very long traces on a new binary translation based, x86-64 compatible VLIW microprocessor. Incrementally comm...
The available Instruction Level Parallelism in Java bytecode (Java-ILP) is not readily exploitable using traditional in-order or out-of-order issue mechanisms due to dependencies ...
R. Achutharaman, R. Govindarajan, G. Hariprakash, ...
Abstract— The execution environments For scientific applications have evolved significantly over the years. Vector and parallel architectures have provided significantly faste...
Instruction scheduling in general, and software pipelining in particular face the di cult task of scheduling operations in the presence of uncertain latencies. The largest contrib...
We present some quantitative performance measurements for the computing power of Programmable Active Memories (PAM), as introduced by [2]. Based on Field Programmable Gate Array (...