Efficient partitioning of parallel loops plays a critical role in high performance and efficient use of multiprocessor systems. Although a significant amount of work has been don...
Arun Kejariwal, Alexandru Nicolau, Utpal Banerjee,...
This paper presents concurrent cache-oblivious (CO) B-trees. We extend the cache-oblivious model to a parallel or distributed setting and present three concurrent CO B-trees. Our ...
Michael A. Bender, Jeremy T. Fineman, Seth Gilbert...
In this paper, we describe a compilation system that automates much of the process of performance tuning that is currently done manually by application programmers interested in h...
Nastaran Baradaran, Jacqueline Chame, Chun Chen, P...
This paper presents a performance analysis of marketbased batch schedulers for clusters of workstations. In contrast to previous work, we use user-centric performance metrics as t...
In contrast to the common belief that OpenMP requires data-parallel extensions to scale well on architectures with non-uniform memory access latency, recent work has shown that it ...