We have developed a methodology for predicting the performance of parallel algorithms on real parallel machines. The methodology consists of two steps. First, we characterize a mac...
Abstract. Concurrent data structures with fine-grained synchronization are notoriously difficult to implement correctly. The difficulty of reasoning about these implementations do...
This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several servi...
Abstract. The ROMIO implementation of the MPI-IO standard provides a portable infrastructure for use on top of any number of different underlying storage targets. These targets va...
Robert Latham, Robert B. Ross, Rajeev Thakur, Bria...
Motion Estimation (ME) is the most computationally intensive part of video compression and video enhancement systems. One bit transform (1BT) based ME algorithms have low computat...