Abstract. Many-core processor architectures require scalable solutions that reflect the locality and power constraints of future generations of technology. This paper presents a CMP architecture that supports automatic mapping and dynamic scheduling of threads leaving the binary code devoid of any explicit communication. The thrust of this approach is to produce binary code that is divorced from implementation parameters, yet, which still gives good performance over future generations of CMPs. A key component of this abstract processor architecture is the memory system. This paper evaluates the memory architectures, which must maintain performance across a range of targets.
Chris R. Jesshope, Mike Lankamp, Li Zhang