Abstract. While standard processors achieve supercomputer performance, a performance gap exists between the interconnect of MPP's and COTS. Standard solutions like Ethernet ca...
If parallelism can be successfully exploited in a program, significant reductions in execution time can be achieved. However, if sections of the code are dominated by parallel ove...
We introduce a set of techniques to both measure and optimize memory access locality of Java applications running on cc-NUMA servers. These techniques work at the object level and...
We propose general purposes natural heuristics for static block and block-cyclic heterogeneous data decomposition over processes of parallel program mapped into multidimensional g...
The ability to dynamically adapt an unstructured grid (or mesh) is a powerful tool for solving computational problems with evolving physical features; however, an efficient parall...
Rupak Biswas, Leonid Oliker, Sajal K. Das, Daniel ...