Abstract. Exploiting the full computational power of current hierarchical multiprocessor machines requires a very careful distribution of threads and data among the underlying non-...
—Shared cache allocation policies play an important role in determining CMP performance. The simplest policy, LRU, allocates cache implicitly as a consequence of its replacement ...
In this paper we describe Frontier, an FPGA placement system that uses design macro-blocks in conjuction with a series of placement algorithms to achieve highly-routable and high-...
We propose Instruction-based Prediction as a means to optimize directory-based cache coherent NUMA shared-memory. Instruction-based prediction is based on observing the behavior o...
Retiming is a widely investigated technique for performance optimization. It performs powerful modifications on a circuit netlist. However, often it is not clear, whether the pred...