In this paper, we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implemen...
In order to reduce the overhead of synchronizing operations of shared memory multiprocessors, this paper proposes a mechanism, named specMEM, to execute memory accesses following ...
We consider networks of workstations which are not only timesharing, but also heterogeneous with a large variation in the computing power and memory capacities of different workst...
This paper describes performance tuning experiences with a three-dimensional unstructured grid Euler flow code from NASA, which we have reimplemented in the PETSc framework and p...
William Gropp, Dinesh K. Kaushik, David E. Keyes, ...
We compare the five candidates for the Advanced Encryption Standard based on their performance on the Alpha 21264, a 64-bit superscalar processor. There are several new features o...