In this paper we employ two techniques suitable for embedded media processors. The first technique, extended subwords, uses four extra bits for every byte in a media register. Th...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati...
Recent advances in GPU programmability and architecture have allowed for the generation of ray casted or ray traced images at interactive rates. How quickly these images can be ge...
We show empirically that some of the issues that affected the design of linear algebra libraries for distributed memory architectures will also likely affect such libraries for s...
Bryan Marker, Field G. Van Zee, Kazushige Goto, Gr...
In this paper, we use the tensor product notation as the framework of a programming methodology for designing block recursive algorithms on various computer networks. In our previ...
Given a certain function f, various methods have been proposed in the past for addressing the important problem of computing the matrix-vector product f(A)b without explicitly comp...