Abstract. Artificial Neural Networks (ANNs) and image processing requires massively parallel computation of simple operator accompanied by heavy memory access. Thus, this type of ...
Dongsun Kim, Hyunsik Kim, Hongsik Kim, Gunhee Han,...
Many large-scale production parallel programs often run for a very long time and require data checkpoint periodically to save the state of the computation for program restart and/o...
Wei-keng Liao, Kenin Coloma, Alok N. Choudhary, Le...
We present an algorithm for implementing byte-range locks using MPI passive-target one-sided communication. This algorithm is useful in any scenario in which multiple processes of ...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...
With the advancements in realtime ray tracing and new global illumination algorithms we are now able to render the most important illumination effects at interactive rates. One of...