Multicluster architectures overcome the scaling problem of centralized resources by distributing the datapath, register file, and memory subsystem across multiple clusters connec...
As a pedagogical exercise in ACL2, we formalize and prove the correctness of a write invalidate cache scheme. In our formalization, an arbitrary number of processors, each with its...
Phase-decoupled methods for code generation are the state of the art in compilers for standard processors but generally produce code of poor quality for irregular target architect...
3D-integration is a promising technology to help combat the “Memory Wall” in future multi-core processors. Past work has considered using 3D-stacked DRAM as a large last-level...
Efficient memory hierarchy design is critical due to the increasing gap between the speed of the processors and the memory. One of the sources of inefficiency in current caches is...