We consider a networking subsystem for message–passing clusters that uses two unidirectional queues for data transfers between the network interface card (NIC) and the lower prot...
A swap instruction, which exchanges a value in memory with a value of a register, is available on many architectures. The primary application of a swap instruction has been for pro...
Apan Qasem, David B. Whalley, Xin Yuan, Robert van...
In this paper, we present a new hybrid branch predictor called the GoStay2, which can effectively reduce indirect misprediction rates. The GoStay2 has two different mechanisms comp...
Operand bypass logic might be one of the critical structures for future microprocessors to achieve high clock speed. The delay of the logic imposes the execution time budget to be ...
High-accuracy PDE solvers use multi-dimensional fast Fourier transforms. The FFTs exhibits a static and structured memory access pattern which results in a large amount of communic...