On systems with multi-core processors, the memory access scheduling scheme plays an important role not only in utilizing the limited memory bandwidth but also in balancing the pro...
This paper examines the scalable parallel implementation of QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-...
This paper proposes and studies a distributed L2 cache management approach through page-level data to cache slice mapping in a future processor chip comprising many cores. L2 cach...
The paper introduces Self-Replicating Objects (SROs), a new nt programming abstraction. An SRO is implemented and used much like an ordinary .NET object and can expose arbitrary us...
This paper presents a mechanism for the separation of control and data flow in NoC-based SoCs consisting of multiple heterogeneous reconfigurable IP cores. This mechanism enables ...