As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
Abstract. In an interpreted execution there is an interdependence between the interpreter's execution and the interpreted application's execution; the implementation of t...
Traditional DMA requires the operating system to perform many tasks to initiate a transfer, with overhead on the order of hundreds or thousands of CPU instructions. This paper des...
Matthias A. Blumrich, Cezary Dubnicki, Edward W. F...
The current multiprocessors such asCray T3D support interprocessor communication using partitioned dimension-order routers (PDRs). In a PDR implementation, the routing logic and sw...
The Data Diffusion Machine is a scalable virtual shared memory architecture. A hierarchical network is used to ensure that all data can be located in a time bounded by O(logp), wh...
Henk L. Muller, Paul W. A. Stallard, David H. D. W...