As communication and I/O traffic increase on the interconnection network of high-performance systems, network contention becomes a critical problem drastically reducing performan...
In this paper, we present techniques and algorithms to improve the performance of various communication patterns on message-passing platforms where, for reasons of safety, user-le...
We describe three new Jacobi orderings for parallel computation of SVD problems on tree architectures. The rst ordering uses the high bandwidth of a perfect binary fat-tree to min...
Thanks to recent advances in virtualization technologies, it is now possible to benefit from the flexibility brought by virtual machines at little cost in terms of CPU performance....
This paper presents a novel stateless, virtualized communication engine for sub-microsecond latency. Using a Field-Programmable-Gate-Array (FPGA) based prototype we show a latency...