Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often ...
The behavior of a multithreaded program does not depend only on its inputs. Scheduling, memory reordering, timing, and low-level hardware effects all introduce nondeterminism in t...
Tom Bergan, Owen Anderson, Joseph Devietti, Luis C...
Portable embedded systems are being driven by consumer demands to be thermally efficient, perform faster, and have longer battery life. To design such a system, various hardware un...
Matrix computation algorithms often exhibit dependencies between neighboring elements inside loop nests such that the frontier between computed elements and those to be computed w...
We present a new deterministic sorting algorithm that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(n log n) time cache-oblivi...