DataScalar architectures improve memory system performance by running computation redundantly across multiple processors, which are each tightly coupled with an associated memory....
Evaluation of high performance parallel systems is a delicate issue, due to the difficulty of generating workloads that represent, with fidelity, those that will run on actual sys...
This paper describes our early experiences with a preproduction Cray XMT system that implements a scalable shared memory architecture with hardware support for multithreading. Unl...
Embedded systems are often operating under hard real-time constraints. Such systems are naturally described as time-bound reactions to external events, a point of view made manife...
Per Lindgren, Johan Eriksson, Simon Aittamaa, Joha...
This paper explores the scalability of the Stream Processor architecture along the instruction-, data-, and thread-level parallelism dimensions. We develop detailed VLSI-cost and ...