Sciweavers

31 search results - page 2 / 7
» Optimizing Mechanisms for Latency Tolerance in Remote Memory...
Sort
View
HPCA
2007
IEEE
13 years 12 months ago
An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing
Shared memory multiprocessors play an increasingly important role in enterprise and scientific computing facilities. Remote misses limit the performance of shared memory applicat...
Liqun Cheng, John B. Carter, Donglai Dai
CF
2010
ACM
13 years 10 months ago
On-chip communication and synchronization mechanisms with cache-integrated network interfaces
Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architect...
Stamatis G. Kavadias, Manolis Katevenis, Michail Z...
CLUSTER
2009
IEEE
13 years 3 months ago
Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters
Scientific computing has seen an immense growth in recent years. The Message Passing Interface (MPI) has become the de-facto standard for parallel programming model for distribute...
Gopalakrishnan Santhanaraman, Tejus Gangadharappa,...
PVM
1998
Springer
13 years 9 months ago
Implementing MPI with the Memory-Based Communication Facilities on the SSS-CORE Operating System
This paper describes an e cient implementation of MPI on the Memory-Based Communication Facilities; Memory-Based FIFO is used for bu ering by the library, and Remote Write for comm...
Kenji Morimoto, Takashi Matsumoto, Kei Hiraki
ASPLOS
1991
ACM
13 years 9 months ago
Code Generation for Streaming: An Access/Execute Mechanism
Access/execute architectures have several advantages over more traditional architectures. Because address generation and memory access are decoupled from operand use, memory laten...
Manuel E. Benitez, Jack W. Davidson