Search Sciweavers | Sciweavers

90 search results - page 3 / 18

» System log pre-processing to improve failure prediction

click to vote

CCGRID
2006
IEEE

125views Distributed And Parallel Com...» more CCGRID 2006»

Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing

13 years 11 months ago

Download www.cs.iit.edu

As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...

Yawei Li, Zhiling Lan

claim paper

Read More »

click to vote

DSN
2005
IEEE

99views Computer Networks» more DSN 2005»

Probabilistic QoS Guarantees for Supercomputing Systems

13 years 11 months ago

Download adam.oliner.net

Supercomputing systems must be able to reliably and efﬁciently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...

Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...

claim paper

Read More »

click to vote

ICPP
2007
IEEE

123views Distributed And Parallel Com...» more ICPP 2007»

A Meta-Learning Failure Predictor for Blue Gene/L Systems

13 years 11 months ago

Download www.mcs.anl.gov

The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...

Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...

claim paper

Read More »

click to vote

ICPP
2008
IEEE

152views Distributed And Parallel Com...» more ICPP 2008»

Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study

13 years 12 months ago

Download www.cs.iit.edu

Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...

Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...

claim paper

Read More »

click to vote

DSN
2006
IEEE

138views Computer Networks» more DSN 2006»

BlueGene/L Failure Analysis and Prediction Models

13 years 11 months ago

Download www.ece.rutgers.edu

The growing computational and storage needs of several scientiﬁc applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...

Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...

claim paper

Read More »

« Prev « First page 3 / 18 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers