While monitoring, instrumented long running parallel applications generate huge amount of instrumentation data. Processing and storing this data incurs overhead, and perturbs the ...
While lower supply voltage is effective for energy reduction, it suffers performance loss. To mitigate the loss, we propose to execute only the part, which does not have any influ...
A desired mesh architecture, based on connected-cycle modules, is constructed. To enhance the reliability, multiple bus sets and spare nodes are dynamically inserted to construct m...
This paper presents a hybrid shared memory architecture which combines the scalability of a multistage interconnection network with the contention reduction benefits of coherent c...
We report Linpack benchmark results on the TSUBAME supercomputer, a large scale heterogeneous system equipped with NVIDIA Tesla GPUs and ClearSpeed SIMD accelerators. With all of 1...