Sciweavers

73 search results - page 4 / 15
» Tiresias: Black-Box Failure Prediction in Distributed System...
Sort
View
ICPP
2007
IEEE
15 years 4 months ago
Fault-Driven Re-Scheduling For Improving System-level Fault Resilience
The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
Yawei Li, Prashasta Gujrati, Zhiling Lan, Xian-He ...
SRDS
2008
IEEE
15 years 4 months ago
Probabilistic Failure Detection for Efficient Distributed Storage Maintenance
Distributed storage systems often use data replication to mask failures and guarantee high data availability. Node failures can be transient or permanent. While the system must ge...
Jing Tian, Zhi Yang, Wei Chen, Ben Y. Zhao, Yafei ...
98
Voted
ICSE
2010
IEEE-ACM
15 years 1 months ago
Collaborative reliability prediction of service-oriented systems
Service-oriented architecture (SOA) is becoming a major software framework for building complex distributed systems. Reliability of the service-oriented systems heavily depends on...
Zibin Zheng, Michael R. Lyu
ICPP
2007
IEEE
15 years 4 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
IMAGING
2004
14 years 11 months ago
Failure of Luminance-Redness Correlation for Illuminant Estimation
We investigate the hypothesis, recently published in Nature, that the human visual system may use some sort of luminance-redness correlation2 together with the scene average for i...
Florian Ciurea, Brian V. Funt