Sciweavers

2784 search results - page 372 / 557
» Instruction Level Parallelism
Sort
View
ICS
1999
Tsinghua U.
15 years 7 months ago
Application scaling under shared virtual memory on a cluster of SMPs
In this paper we examine how application performance scales on a state-of-the-art shared virtual memory (SVM) system on a cluster with 64 processors, comprising 4-way SMPs connect...
Dongming Jiang, Brian O'Kelley, Xiang Yu, Sanjeev ...
116
Voted
IPPS
1998
IEEE
15 years 7 months ago
Compiler-Optimization of Implicit Reductions for Distributed Memory Multiprocessors
This paper presents reduction recognition and parallel code generationstrategies for distributed-memorymultiprocessors. We describe techniques to recognize a broad range of implic...
Bo Lu, John M. Mellor-Crummey
112
Voted
SPAA
1998
ACM
15 years 7 months ago
Elimination Forest Guided 2D Sparse LU Factorization
Sparse LU factorization with partial pivoting is important for many scienti c applications and delivering high performance for this problem is di cult on distributed memory machin...
Kai Shen, Xiangmin Jiao, Tao Yang
186
Voted
CF
2007
ACM
15 years 7 months ago
Converting massive TLP to DLP: a special-purpose processor for molecular orbital computations
We propose an application specific processor for computational quantum chemistry. The kernel of interest is the computation of electron repulsion integrals (ERIs), which vary in c...
Tirath Ramdas, Gregory K. Egan, David Abramson, Ki...
121
Voted
EUROPAR
2009
Springer
15 years 7 months ago
A Buffer Space Optimal Solution for Re-establishing the Packet Order in a MPSoC Network Processor
We consider a multi-processor system-on-chip destined for streaming applications. An application is composed of one input and one output queue and in-between, several levels of ide...
Daniela Genius, Alix Munier Kordon, Khouloud Zine ...