As new processor and memory architectures advance, clusters start to be built from larger SMP systems, which makes MPI intra-node communication a critical issue in high performanc...
Clusters of commodity machines have become a popular way of building cheap high performance parallel computers. Many of these designs rely on standard Ethernet networks as a syste...
Francis Vaughan, Duncan A. Grove, Paul D. Coddingt...
In this paper, we propose two speculation-based methods for fast and accurate simulation-based performance and dependability analysis of complex systems, incorporating detailed si...
Yiqing Huang, Zbigniew Kalbarczyk, Ravishankar K. ...
VLIW architectures are popular in embedded systems because they offer high-performance processing at low cost and energy. The major problem with traditional VLIW designs is that t...
Hongtao Zhong, Kevin Fan, Scott A. Mahlke, Michael...
Abstract. This paper introduces ThreadMill - a distributed and parallel component architecture for applications that process large volumes of streamed (time-sequenced) data, such a...