With the increasing importance of energy consumption considerations and new requirements of emerging applications, in-network processing of information gains recognition as a viab...
Due to shared cache contentions and interconnect delays, data prefetching is more critical in alleviating penalties from increasing memory latencies and demands on Chip-Multiproce...
Xudong Shi, Zhen Yang, Jih-Kwon Peir, Lu Peng, Yen...
The HPC Challenge (HPCC) benchmark suite and the Intel MPI Benchmark (IMB) are used to compare and evaluate the combined performance of processor, memory subsystem and interconnec...
Subhash Saini, Robert Ciotti, Brian T. N. Gunney, ...
This paper describes a parallel implementation of our recently developed algorithm for phylogenetic analysis on the IBM BlueGene/L cluster [15]. This algorithm constructs evolutio...
Bing Bing Zhou, Daniel Chu, Monther Tarawneh, Ping...
In this paper, we study dynamic protocol update (DPU). Contrary to local code updates on-the-fly, DPU requires global coordination of local code replacements. We propose a novel ...