With the growing complexity in computer systems, it has been a real challenge to detect and diagnose problems in today’s large-scale distributed systems. Usually, the correlatio...
Abstract This paper proposes a new fine-grained data distribution operation MPI Alltoall specific that allows an element-wise distribution of data elements to specific target pro...
Dynamic instrumentation systems have proven to be extremely valuable for program introspection, architectural simulation, and bug detection. Yet a major drawback of modern instrum...
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, schedu...
A new RAID-x (redundant array of inexpensive disks at level x) architecture is presented for distributed I/O processing on a serverless cluster of computers. The RAID-x architectu...