Scalable busy-wait synchronization algorithms are essential for achieving good parallel program performance on large scale multiprocessors. Such algorithms include mutual exclusio...
Robert W. Wisniewski, Leonidas I. Kontothanassis, ...
Researchers have proposed using hardware data compression units within the memory hierarchies of microprocessors in order to improve performance, energy efficiency, and functional...
Xi Chen, Lei Yang, Haris Lekatsas, Robert P. Dick,...
This paper presents high-performance collective communication algorithms and implementations that exploit the unique architectural features of the Cell heterogeneous multicore pro...
In this paper, we implement an efficient, completely software-based graphics pipeline on a GPU. Unlike previous approaches, we obey ordering constraints imposed by current graphi...
Web services, with an emphasis on open standards and flexibility, may provide benefits over existing capital markets integration practices. However, web services must first meet c...