Sciweavers

4495 search results - page 4 / 899
» A Performance Monitoring System for Large Computing Clusters
Sort
View
ISPAN
2005
IEEE
13 years 11 months ago
A Scalable Method for Predicting Network Performance in Heterogeneous Clusters
An important requirement for the effective scheduling of parallel applications on large heterogeneous clusters is a current view of system resource availability. Maintaining such ...
Dimitrios Katramatos, Steve J. Chapin
IPPS
2005
IEEE
13 years 11 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
CCGRID
2008
IEEE
14 years 19 days ago
Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, schedu...
Yoshikazu Kamoshida, Kenjiro Taura
ICDCS
2009
IEEE
14 years 3 months ago
REMO: Resource-Aware Application State Monitoring for Large-Scale Distributed Systems
To observe, analyze and control large scale distributed systems and the applications hosted on them, there is an increasing need to continuously monitor performance attributes of ...
Shicong Meng, Srinivas R. Kashyap, Chitra Venkatra...