Sciweavers

37 search results - page 2 / 8
» Distributed Watchpoints: Debugging Large Multi-Robot Systems
Sort
View
SOSP
2009
ACM
14 years 1 months ago
Debugging in the (very) large: ten years of implementation and experience
Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billion...
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg...
HPDC
2006
IEEE
13 years 11 months ago
Troubleshooting Distributed Systems via Data Mining
Through massive parallelism, distributed systems enable the multiplication of productivity. Unfortunately, increasing the scale of available machines to users will also multiply d...
David A. Cieslak, Douglas Thain, Nitesh V. Chawla
ICDCS
2010
IEEE
13 years 8 months ago
Visual, Log-Based Causal Tracing for Performance Debugging of MapReduce Systems
Abstract—The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce pr...
Jiaqi Tan, Soila Kavulya, Rajeev Gandhi, Priya Nar...
OOPSLA
2007
Springer
13 years 11 months ago
Scalable omniscient debugging
Omniscient debuggers make it possible to navigate backwards in time within a program execution trace, drastically improving the task of debugging complex applications. Still, they...
Guillaume Pothier, Éric Tanter, José...
CCGRID
2008
IEEE
13 years 11 months ago
Scalable Data Gathering for Real-Time Monitoring Systems on Distributed Computing
Real-time monitoring is increasingly becoming important in various scenes of large scale, multi-site distributed/parallel computing, e.g, understanding behavior of systems, schedu...
Yoshikazu Kamoshida, Kenjiro Taura