Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been don...
Arun Kejariwal, Alexandru Nicolau, Utpal Banerjee,...
As device scales shrink, higher transistor counts are available while soft-errors, even in logic, become a major concern. A new class of architectures, such as Merrimac and the IB...
Mattan Erez, Nuwan Jayasena, Timothy J. Knight, Wi...
Abstract. The data redistribution problems on multi-computers had been extensively studied. Irregular data redistribution has been paid attention recently since it can distribute d...
A style for programming problems from matrix algebra is developed with a familiar example and new tools, yielding high performance with a couple of surprising exceptions. The under...
David S. Wise, Craig Citro, Joshua Hursey, Fang Li...
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is organized as a n...
Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhan...