Sciweavers

26 search results - page 4 / 6
» A Performance Comparison of DRAM Memory System Optimizations...
Sort
View
PLDI
2003
ACM
13 years 11 months ago
A comparison of empirical and model-driven optimization
Empirical program optimizers estimate the values of key optimization parameters by generating different program versions and running them on the actual hardware to determine which...
Kamen Yotov, Xiaoming Li, Gang Ren, Michael Cibuls...
PLDI
2012
ACM
11 years 8 months ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
ISCA
2010
IEEE
214views Hardware» more  ISCA 2010»
13 years 8 months ago
Translation caching: skip, don't walk (the page table)
This paper explores the design space of MMU caches that accelerate virtual-to-physical address translation in processor architectures, such as x86-64, that use a radix tree page t...
Thomas W. Barr, Alan L. Cox, Scott Rixner
EUROPAR
2009
Springer
14 years 19 days ago
PSPIKE: A Parallel Hybrid Sparse Linear System Solver
The availability of large-scale computing platforms comprised of tens of thousands of multicore processors motivates the need for the next generation of highly scalable sparse line...
Murat Manguoglu, Ahmed H. Sameh, Olaf Schenk
SOSP
1997
ACM
13 years 7 months ago
Cashmere-2L: Software Coherent Shared Memory on a Clustered Remote-Write Network
Low-latency remote-write networks, such as DEC’s Memory Channel, provide the possibility of transparent, inexpensive, large-scale shared-memory parallel computing on clusters of...
Robert Stets, Sandhya Dwarkadas, Nikos Hardavellas...