In a Chip Multi-Processor (CMP) with private caches, the last level cache is statically partitioned between all the cores. This prevents such CMPs from sharing cache capacity in r...
Many workload characterization studies depend on accurate measurements of the cost of executing a piece of code. Often these measurements are conducted using infrastructures to ac...
Dmitrijs Zaparanuks, Milan Jovic, Matthias Hauswir...
In this work we describe a parallel implementation of the Poisson Surface Reconstruction algorithm based on multigrid domain decomposition. We compare implementations using differ...
Matthew Bolitho, Michael M. Kazhdan, Randal C. Bur...
The adoption of transactional memory is hindered by the high overhead of software transactional memory and the intrusive design changes required by previously proposed TM hardware...
Jared Casper, Tayo Oguntebi, Sungpack Hong, Nathan...
1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily...