The identification and processing of similarities in the data play a key role in multiple application scenarios. Several types of similarity-aware operations have been studied in ...
Bulk memory copies incur large overheads such as CPU stalling (i.e., no overlap of computation with memory copy operation), small register-size data movement, cache pollution, etc...
Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dha...
Wide Single Instruction, Multiple Thread (SIMT) architectures often require a static allocation of thread groups that are executed in lockstep throughout the entire application ker...
This paper discusses a reliable group communication system using active messages to update shared objects. We discuss the model, implementation techniques, and our preliminary per...
To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. ...
William Jalby, Christophe Lemuet, Sid Ahmed Ali To...