—Accurate fault detection is a key element of resilient computing. Syslogs provide key information regarding faults, and are found on nearly all computing systems. Discovering ne...
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, schedu...
Hidden information is a critical issue for the successful delivery of SLAs in grid systems. It arises when the agents (hardware and software resources) employed to serve a task be...
In this paper, we present DeployWare to address the deployment of distributed and heterogeneous software systems on large scale infrastructures such as grids. Deployment of softwa...
Virtualization using Xen-based virtual machine environment has yet to permeate the field of high performance computing (HPC). One major requirement for HPC is the availability of ...