Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...
For many aspects of memory theoretical treatment already exists, in particular for: simple cache construction, store buers and store buer forwarding, cache coherence protocols, o...
Ulan Degenbaev, Wolfgang J. Paul, Norbert Schirmer
A problem with running distributed shared memory applications in heterogeneous environments is that making optimal use of available resources often requires significant changes t...
Heterogeneous multicores, such as Cell BE processors and GPGPUs, typically do not have caches for their accelerator cores because coherence traffic, cache misses, and latencies fr...
This paper presents COBRA (Continuous Binary ReAdaptation), a runtime binary optimization framework, for multithreaded applications. It is currently implemented on Itanium 2 based...