Sciweavers

70 search results - page 2 / 14
» Co-designing the failure analysis and monitoring of large-sc...
Sort
View
IPPS
2005
IEEE
13 years 11 months ago
Monitoring and Debugging Parallel Software with BCS-MPI on Large-Scale Clusters
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
Juan Fernández, Fabrizio Petrini, Eitan Fra...
UPP
2004
Springer
13 years 10 months ago
Grassroots Approach to Self-management in Large-Scale Distributed Systems
Abstract. Traditionally, autonomic computing is envisioned as replacing the human factor in the deployment, administration and maintenance of computer systems that are ever more co...
Özalp Babaoglu, Márk Jelasity, Alberto...
OSDI
2008
ACM
14 years 5 months ago
Mining Console Logs for Large-Scale System Problem Detection
The console logs generated by an application contain messages that the application developers believed would be useful in debugging or monitoring the application. Despite the ubiq...
Wei Xu, Ling Huang, Armando Fox, David A. Patterso...
ECBS
2005
IEEE
179views Hardware» more  ECBS 2005»
13 years 11 months ago
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight...
Derek Messie, Mina Jung, Jae C. Oh, Shweta Shetty,...
ISSRE
2010
IEEE
13 years 3 months ago
A Large-Scale Industrial Case Study on Architecture-Based Software Reliability Analysis
—Architecture-based software reliability analysis methods shall help software architects to identify critical software components and to quantify their influence on the system r...
Heiko Koziolek, Bastian Schlich, Carlos G. Bilich