To support dynamic address translation in today's microprocessors, the first-level cache is accessed in parallel with a translation lookaside buffer (TLB). However, this curre...
Abstract. While standard processors achieve supercomputer performance, a performance gap exists between the interconnect of MPP's and COTS. Standard solutions like Ethernet ca...
This paper describes several algorithms to perform all-to-all communication on a two-dimensional mesh connected computer with wormhole routing. We discuss both direct algorithms, ...
Distributed systems based on cluster of workstation are more and more difficult to manage due to the increasing number of processors involved, and the complexity of associated appl...
A large multi-ported register file is indispensable for exploiting instruction level parallelism (ILP) in today's dynamically scheduled superscalar processors. The number of ...