CC-NUMA is a widely adopted and deployed architecture of high performance computers. These machines are attractive for their transparent access to local and remote memory. However...
Profile data is valuable for identifying performance bottlenecks and guiding optimizations. Periodic sampling of a processor's performance monitoring hardware is an effective...
Jeffrey Dean, James E. Hicks, Carl A. Waldspurger,...
Intertask/interprocess synchronization overheads may be significant in a multiprocessor-shared memory System-on-a-Chip implementation. These overheads are observed in terms of loc...
Bilge Saglam Akgul, Jaehwan Lee, Vincent John Moon...
In this paper, we present a novel hybrid multiplier architecture that has the regularity of linear array multipliers and the performance of tree multipliers and is highly scalable...
We describe a new approach to object replication in Java, aimed at improving the performance of parallel programs. Our programming model allows the programmer to define groups of ...