Registers in processors generally contain words or, with the addition of multimedia extensions, short vectors of subwords of bytes or 16-bit elements. In this paper, we view the c...
—With systems such as Road Runner, there is a trend in super computing to offload parallel tasks to special purpose co-processors, composed of many relatively simple scalar proc...
Matthew Badin, Lubomir Bic, Michael B. Dillencourt...
:. In designing application-specific bit-level architectures and in programming existing bit-level processor arrays, it is necessary to expand a word-level algorithm into its bit-...
A blossoming paradigm for block-recursive matrix algorithms is presented that, at once, attains excellent performance measured by • time, • TLB misses, • L1 misses, • L2 m...
Abstract— The mixed-signal processor performs digital vectormatrix multiplication using internally analog fine-grain parallel computing. The three-transistor CID/DRAM unit cell ...