Conventional processors use a fully-associative store queue (SQ) to implement store-load forwarding. Associative search latency does not scale well to capacities and bandwidths re...
In the past, Field Programmable Gate Array (FPGA) circuits only contained a limited amount of logic and operated at a low frequency. Few applications running on FPGAs consumed exc...
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Cache hierarchies have been traditionally designed for usage by a single application, thread or core. As multi-threaded (MT) and multi-core (CMP) platform architectures emerge and...
In this paper we introduce the Range Trie, a new multiway tree data structure for address lookup. Each Range Trie node maps to an address range [Na, Nb) and performs multiple comp...
Ioannis Sourdis, Georgios Stefanakis, Ruben de Sme...