Sciweavers

PLDI
2005
ACM

TraceBack: first fault diagnosis by reconstruction of distributed control flow

13 years 10 months ago
TraceBack: first fault diagnosis by reconstruction of distributed control flow
Faults that occur in production systems are the most important faults to fix, but most production systems lack the debugging facilities present in development environments. TraceBack provides debugging information for production systems by providing execution history data about program problems (such as crashes, hangs, and exceptions). TraceBack supports features commonly found in production environments such as multiple threads, dynamically loaded modules, multiple source languages (e.g., Java applications running with JNI modules written in C++), and distributed execution across multiple computers. TraceBack supports first fault diagnosis—discovering what went wrong the first time a fault is encountered. The user can see how the program reached the fault state without having to re-run the computation; in effect enabling a limited form of a debugger in production code. TraceBack uses static, binary program analysis to inject lowoverhead runtime instrumentation at control-flow block...
Andrew Ayers, Richard Schooler, Chris Metcalf, Ana
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where PLDI
Authors Andrew Ayers, Richard Schooler, Chris Metcalf, Anant Agarwal, Junghwan Rhee, Emmett Witchel
Comments (0)