Sciweavers

901 search results - page 171 / 181
» Hiding Communication Latency in Data Parallel Applications
Sort
View
82
Voted
ISCA
2000
IEEE
63views Hardware» more  ISCA 2000»
15 years 4 months ago
An embedded DRAM architecture for large-scale spatial-lattice computations
Spatial-lattice computations with finite-range interactions are an important class of easily parallelized computations. This class includes many simple and direct algorithms for ...
Norman Margolus
102
Voted
PC
2002
137views Management» more  PC 2002»
15 years 19 hour ago
The Chebyshev iteration revisited
Compared to Krylov space methods based on orthogonal or oblique projection, the Chebyshev iteration does not require inner products and is therefore particularly suited for massiv...
Martin H. Gutknecht, Stefan Röllin
98
Voted
WWW
2005
ACM
16 years 1 months ago
LSH forest: self-tuning indexes for similarity search
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
Mayank Bawa, Tyson Condie, Prasanna Ganesan
126
Voted
ASPLOS
1996
ACM
15 years 4 months ago
An Integrated Compile-Time/Run-Time Software Distributed Shared Memory System
On a distributed memory machine, hand-coded message passing leads to the most efficient execution, but it is difficult to use. Parallelizing compilers can approach the performance...
Sandhya Dwarkadas, Alan L. Cox, Willy Zwaenepoel
110
Voted
ICS
2009
Tsinghua U.
15 years 7 months ago
MPI-aware compiler optimizations for improving communication-computation overlap
Several existing compiler transformations can help improve communication-computation overlap in MPI applications. However, traditional compilers treat calls to the MPI library as ...
Anthony Danalis, Lori L. Pollock, D. Martin Swany,...