Abstract. This paper introduces ThreadMill - a distributed and parallel component architecture for applications that process large volumes of streamed (time-sequenced) data, such a...
This paper presents some techniques for efficient motion estimation (ME) implementation on fixed-point digital signal processor (DSP) for high resolution video coding. First, chal...
The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a ...
Yuan Lin, Hyunseok Lee, Mark Woh, Yoav Harel, Scot...
As new computer architectures are developed to exploit large-scale data-level parallelism, techniques are needed to retarget legacy sequential code to these platforms. Sequential ...
: Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard ...
Emmanuel Agullo, Jack Dongarra, Rajib Nath, Stanim...