Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves inst...
This paper deals with the computation of the rank and some integer Smith forms of a series of sparse matrices arising in algebraic K-theory. The number of non zero entries in the ...
Jean-Guillaume Dumas, Philippe Elbaz-Vincent, Pasc...
The combination of high-performance processing power and flexibility found in network processors (NPs) has made them a good solution for today’s packet processing needs. Similar...
Three dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geomet...