Sciweavers

OSDI
2008
ACM
14 years 5 months ago
CLUEBOX: A Performance Log Analyzer for Automated Troubleshooting
S. Ratna Sandeep, M. Swapna, Thirumale Niranjan, S...
OSDI
2008
ACM
14 years 5 months ago
Carnegie Mellon's CyDAT: Harnessing a Wide Array of Telemetry Data to Enhance Distributed System Diagnostics
The number and complexity of distributed applications has exploded, and to-date, each has had to create its own method for providing diagnostic tools and performance metrics. Thes...
Chas DiFatta, Mark Poepping, Daniel V. Klein
OSDI
2008
ACM
14 years 5 months ago
Hunting for Problems with Artemis
Artemis is a modular application designed for analyzing and troubleshooting the performance of large clusters running datacenter services. Artemis is composed of four modules: (1)...
Gabriela F. Cretu-Ciocarlie, Mihai Budiu, Mois&eac...
OSDI
2008
ACM
14 years 5 months ago
Mining Console Logs for Large-Scale System Problem Detection
The console logs generated by an application contain messages that the application developers believed would be useful in debugging or monitoring the application. Despite the ubiq...
Wei Xu, Ling Huang, Armando Fox, David A. Patterso...
OSDI
2008
ACM
14 years 5 months ago
Finding Similar Failures Using Callstack Similarity
We develop a machine-learned similarity metric for Windows failure reports using telemetry data gathered from clients describing the failures. The key feature is a tuned callstack...
Kevin Bartz, Jack W. Stokes, John C. Platt, Ryan K...
OSDI
2008
ACM
14 years 5 months ago
From Optimization to Regret Minimization and Back Again
Internet routing is mostly based on static information-it's dynamicity is limited to reacting to changes in topology. Adaptive performance-based routing decisions would not o...
Ioannis C. Avramopoulos, Jennifer Rexford, Robert ...
OSDI
2008
ACM
14 years 5 months ago
HiLighter: Automatically Building Robust Signatures of Performance Behavior for Small- and Large-Scale Systems
Previous work showed that statistical analysis techniques could successfully be used to construct compact signatures of distinct operational problems in Internet server systems. B...
Armando Fox, Moisés Goldszmidt, Peter Bod&i...
OSDI
2008
ACM
14 years 5 months ago
Empirical Comparison of Techniques for Automated Failure Diagnosis
Automated techniques to diagnose the cause of system failures based on monitoring data is an active area of research at the intersection of systems and machine learning. In this p...
Songyun Duan, Shivnath Babu
OSDI
2008
ACM
14 years 5 months ago
Probabilistic Inference in Queueing Networks
Although queueing models have long been used to model the performance of computer systems, they are out of favor with practitioners, because they have a reputation for requiring u...
Charles A. Sutton, Michael I. Jordan