Sciweavers

DAC
2012
ACM
11 years 6 months ago
Cache revive: architecting volatile STT-RAM caches for enhanced performance in CMPs
Spin-Transfer Torque RAM (STT-RAM) is an emerging non-volatile memory (NVM) technology that has the potential to replace the conventional on-chip SRAM caches for designing a more ...
Adwait Jog, Asit K. Mishra, Cong Xu, Yuan Xie, Vij...
PLDI
2012
ACM
11 years 7 months ago
Adaptive input-aware compilation for graphics engines
While graphics processing units (GPUs) provide low-cost and efficient platforms for accelerating high performance computations, the tedious process of performance tuning required...
Mehrzad Samadi, Amir Hormati, Mojtaba Mehrara, Jan...
PPOPP
2012
ACM
12 years 1 days ago
PARRAY: a unifying array representation for heterogeneous parallelism
This paper introduces a programming interface called PARRAY (or Parallelizing ARRAYs) that supports system-level succinct programming for heterogeneous parallel systems like GPU c...
Yifeng Chen, Xiang Cui, Hong Mei
CSE
2012
IEEE
12 years 5 days ago
Accelerating Quantum Monte Carlo Simulations of Real Materials on GPU Clusters
—Continuum quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. By solving the manybody Schr¨odinge...
Kenneth Esler, Jeongnim Kim, David M. Ceperley, Lu...
ARC
2012
Springer
280views Hardware» more  ARC 2012»
12 years 6 days ago
Scalable Memory Hierarchies for Embedded Manycore Systems
As the size of FPGA devices grows following Moore’s law, it becomes possible to put a complete manycore system onto a single FPGA chip. The centralized memory hierarchy on typica...
Sen Ma, Miaoqing Huang, Eugene Cartwright, David L...
HIPEAC
2011
Springer
12 years 4 months ago
Decoupled zero-compressed memory
For each computer system generation, there are always applications or workloads for which the main memory size is the major limitation. On the other hand, in many cases, one could...
Julien Dusser, André Seznec
HIPEAC
2011
Springer
12 years 4 months ago
NoC-aware cache design for multithreaded execution on tiled chip multiprocessors
In chip multiprocessors (CMPs), data accesslatency dependson the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access late...
Ahmed Abousamra, Alex K. Jones, Rami G. Melhem
JCPHY
2011
192views more  JCPHY 2011»
12 years 7 months ago
Fast analysis of molecular dynamics trajectories with graphics processing units - Radial distribution function histogramming
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in ...
Benjamin G. Levine, John E. Stone, Axel Kohlmeyer
PE
2010
Springer
175views Optimization» more  PE 2010»
12 years 11 months ago
Generalized ERSS tree model: Revisiting working sets
Accurately characterizing the resource usage of an application at various levels in the memory hierarchy has been a long-standing research problem. Existing characterization studi...
Ricardo Koller, Akshat Verma, Raju Rangaswami
IPPS
2010
IEEE
13 years 2 months ago
Restructuring parallel loops to curb false sharing on multicore architectures
The memory hierarchy of most multicore systems contains one or more levels of cache that is shared among multiple cores. The shared-cache architecture presents many opportunities f...
Santosh Sarangkar, Apan Qasem