Transport triggered architecture (TTA) has been shown to provide an efficient way to design application specific instruction set processors. However, the interconnection network of...
Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and threadlevel parallelism. In this...
The floating-point unit in the Synergistic Processor Element of the 1st generation multi-core CELL Processor is described. The FPU supports 4-way SIMD single precision and intege...
Embedded digital signal processors for baseband communication systems have stringent design constraints including high computational bandwidth, low power consumption, and low inter...
Michael J. Schulte, C. John Glossner, Suman Mamidi...
This paper describes a sub-mW motion estimation processor core for MPEG-4 video encoding. It features a Gradient Descent Search algorithm whose computation power is only 7% of the...