Sciweavers

1990 search results - page 206 / 398
» Optimizing the Instruction Cache Performance of the Operatin...
Sort
View
ISCA
1996
IEEE
99views Hardware» more  ISCA 1996»
15 years 9 months ago
Coherent Network Interfaces for Fine-Grain Communication
Historically, processor accesses to memory-mapped device registers have been marked uncachable to insure their visibility to the device. The ubiquity of snooping cache coherence, ...
Shubhendu S. Mukherjee, Babak Falsafi, Mark D. Hil...
155
Voted
ICCS
2009
Springer
15 years 11 months ago
Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes
The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to ...
Boyana Norris, Albert Hartono, Elizabeth R. Jessup...
IPPS
2000
IEEE
15 years 8 months ago
Bandwidth-Efficient Collective Communication for Clustered Wide Area Systems
Metacomputing infrastructures couple multiple clusters (or MPPs) via wide-area networks. A major problem in programming parallel applications for such platforms is their hierarchi...
Thilo Kielmann, Henri E. Bal, Sergei Gorlatch
ISCA
2006
IEEE
125views Hardware» more  ISCA 2006»
15 years 11 months ago
Architectural Semantics for Practical Transactional Memory
Transactional Memory (TM) simplifies parallel programming by allowing for parallel execution of atomic tasks. Thus far, TM systems have focused on implementing transactional stat...
Austen McDonald, JaeWoong Chung, Brian D. Carlstro...
WOSP
2004
ACM
15 years 10 months ago
Collecting whole-system reference traces of multiprogrammed and multithreaded workloads
The simulated evaluation of memory management policies relies on reference traces—logs of memory operations performed by running processes. No existing approach to reference tra...
Scott F. Kaplan