Modern graphics processing units (GPUs) include hardwarecontrolled caches to reduce bandwidth requirements and energy consumption. However, current GPU cache hierarchies are ineï¬...
Yingying Tian, Sooraj Puthoor, Joseph L. Greathous...
Read-copy update (RCU) is a shared memory synchronization mechanism with scalable synchronization-free reads that nevertheless execute correctly with concurrent updates. To guaran...
This paper describes Surge, a collection-oriented programming model that enables programmers to compose parallel computations using nested high-level data collections and operator...
Saurav Muralidharan, Michael Garland, Bryan C. Cat...
As modern hardware keeps evolving, an increasingly effective approach to develop energy efficient and high-performance solvers is to design them to work on many small size indepe...
Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stani...