In this paper we examine how application performance scales on a state-of-the-art shared virtual memory (SVM) system on a cluster with 64 processors, comprising 4-way SMPs connect...
Dongming Jiang, Brian O'Kelley, Xiang Yu, Sanjeev ...
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high performance for this problem is di cult on distributed memory machin...
We propose an application specific processor for computational quantum chemistry. The kernel of interest is the computation of electron repulsion integrals (ERIs), which vary in c...
Tirath Ramdas, Gregory K. Egan, David Abramson, Ki...
We consider a multi-processor system-on-chip destined for streaming applications. An application is composed of one input and one output queue and in-between, several levels of ide...