Abstract. Traditional parallel programming methodologies for improving performance assume cache-based parallel systems. However, new architectures, like the IBM Cyclops-64 (C64), b...
Elkin Garcia, Ioannis E. Venetis, Rishi Khan, Guan...
Abstract. Shared last level cache is crucial to performance. However, multithread program model incurs serious contention in shared cache. In this paper, to reduce average cache ac...
—This paper reports a study of mapping the Finite Difference Time Domain (FDTD) application to the IBM Cyclops64 (C64) many-core chip architecture [1]. C64 is chosen for this stu...
A recent development in radio astronomy is to replace traditional dishes with many small antennas. The signals are combined to form one large, virtual telescope. The enormous data ...
Moore’s Law suggests that the number of processing cores on a single chip increases exponentially. The future performance increases will be mainly extracted from thread-level par...
Nan Yuan, Yongbin Zhou, Guangming Tan, Junchao Zha...