Clock synchronization is a crucial basic service in typical sensor networks, since the observations of distributed sensors more often than not need to be ordered ("a happened...
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with distinct computation and communication phases. Although the communication phase in such ...
Torsten Hoefler, Christian Siebert, Andrew Lumsdai...
Hierarchical culling is a key acceleration technique used to efficiently handle massive models for ray tracing, collision detection, etc. To support such hierarchical culling, bo...
Tae-Joon Kim, Yongyoung Byun, Yongjin Kim, Bochang...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...
Performance non-determinism in computer systems complicates evaluation, use, and even development of these systems. In performance evaluation via benchmarking and simulation, nond...