Sciweavers

26 search results - page 5 / 6
» Using Embedded Network Processors to Implement Global Memory...
Sort
View
122
Voted
ICS
2010
Tsinghua U.
15 years 8 months ago
Large-scale FFT on GPU clusters
A GPU cluster is a cluster equipped with GPU devices. Excellent acceleration is achievable for computation-intensive tasks (e.g. matrix multiplication and LINPACK) and bandwidth-i...
Yifeng Chen, Xiang Cui, Hong Mei
137
Voted
CLUSTER
2008
IEEE
15 years 10 months ago
High message rate, NIC-based atomics: Design and performance considerations
—Remote atomic memory operations are critical for achieving high-performance synchronization in tightly-coupled systems. Previous approaches to implementing atomic memory operati...
Keith D. Underwood, Michael Levenhagen, K. Scott H...
123
Voted
IPPS
2000
IEEE
15 years 8 months ago
Thread Migration and Load Balancing in Non-Dedicated Environments
Networks of workstations are fast becoming the standard environment for parallel applications. However, the use of “found” resources as a platform for tightly-coupled runtime ...
Kritchalach Thitikamol, Peter J. Keleher
152
Voted
RTAS
2010
IEEE
15 years 1 months ago
Physicalnet: A Generic Framework for Managing and Programming Across Pervasive Computing Networks
This paper describes the design and implementation of a pervasive computing framework, named Physicalnet. Essentially, Physicalnet is a generic paradigm for managing and programmi...
Pascal Vicaire, Zhiheng Xie, Enamul Hoque, John A....
149
Voted
SC
1995
ACM
15 years 7 months ago
A Performance Evaluation of the Convex SPP-1000 Scalable Shared Memory Parallel Computer
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical s...
Thomas L. Sterling, Daniel Savarese, Peter MacNeic...