We consider unicast-based pipelined broadcast schemes for clusters connected by multiple Ethernet switches. By splitting a large broadcast message into segments and broadcasting t...
The development of Field Programmable Gate Arrays (FPGAs) had tremendous improvements in the last few years. They were extended from simple logic circuits to complex Systems-on-Ch...
Memory-intensive applications often suffer from the poor performance of disk swapping when memory is inadequate. Remote memory sharing schemes, which provide a remote memory that ...
This paper presents the alternatives available to support threadprivate data in OpenMP and evaluates them. We show how current compilation systems rely on custom techniques for im...
Achieving high performance for concurrent applications on modern multiprocessors remains challenging. Many programmers avoid locking to improve performance, while others replace l...
Thomas E. Hart, Paul E. McKenney, Angela Demke Bro...