Obtaining performance data for application or system software is typically di cult, especially when the source code is not available. While popular techniques such as event trappi...
This paper addresses the problem of selection and discovery of a consistent availability monitoring overlay for computer hosts in a large-scale distributed application, where host...
The lifecycle for industrial applications are becoming shorter, the application complexity increases, performance is to low, fault tolerance is required, reuse of components is de...
Debugging the performance of parallel and distributed systems remains a difficult task despite the widespread use of middleware packages for automatic distribution, communication...
This paper considers the problem of change detection using local distributed eigen monitoring algorithms for next generation of astronomy petascale data pipelines such as the Larg...