In this paper, we present our effort in developing an opensource GPU (graphics processing units) code library for the MATLAB Image Processing Toolbox (IPT). We ported a dozen of r...
Jingfei Kong, Martin Dimitrov, Yi Yang, Janaka Liy...
This paper proposes a technique that enables performing multi-cycle (multiplication, division, square-root ...) computations in a single cycle. The technique is based on the notio...
High performance compilers increasingly rely on accurate modeling of the machine resources to efficiently exploit the instruction level parallelism of an application. In this pape...
: In scalable parallel machines, processors can make local memory accesses much faster than they can make remote memory accesses. In addition, when a number of remote accesses must...
Abstract. Concurrent data structures with fine-grained synchronization are notoriously difficult to implement correctly. The difficulty of reasoning about these implementations do...