We introduce virtually-pipelined memory, an architectural technique that efficiently supports high-bandwidth, uniform latency memory accesses, and high-confidence throughput eve...
Instruction aggregation—the grouping of multiple operations into a single processing unit—is a technique that has recently been used to amplify the bandwidth and capacity of c...
A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 m...
Multiprocessor system-on-chip (MPSoC) is an integrated circuit containing multiple instruction-set processors on a single chip that implements most of the functionality of a compl...
The architectural design of embedded systems is becoming increasingly idiosyncratic to meet varying constraints regarding energy consumption, code size, and execution time. Tradit...