Modern DRAM systems rely on memory controllers that employ out-of-order scheduling to maximize row access locality and bank-level parallelism, which in turn maximizes DRAM bandwid...
The integration of microprocessors and field-programmable gate array (FPGA) fabric on a single chip increases both the utility and necessity of tools that automatically move softw...
Scott Sirowy, Yonghui Wu, Stefano Lonardi, Frank V...
The design and evaluation of microprocessor architectures is a difficult and time-consuming task. Although small, handcoded microbenchmarks can be used to accelerate performance e...
Clustering is a common technique to overcome the wire delay problem incurred by the evolution of technology. Fully-distributed architectures, where the register file, the functio...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...