Many interesting large-scale systems are distributed systems of multiple communicating components. Such systems can be very hard to debug, especially when they exhibit poor perfor...
Marcos Kawazoe Aguilera, Jeffrey C. Mogul, Janet L...
This paper presents Capriccio, a scalable thread package for use with high-concurrency servers. While recent work has advocated event-based systems, we believe that threadbased sy...
J. Robert von Behren, Jeremy Condit, Feng Zhou, Ge...
Software defects significantly reduce system dependability. Among various types of software bugs, semantic and concurrency bugs are two of the most difficult to detect. This pape...
Shan Lu, Soyeon Park, Chongfeng Hu, Xiao Ma, Weiha...
Diagnosing production run failures is a challenging yet important task. Most previous work focuses on offsite diagnosis, i.e. development site diagnosis with the programmers prese...
Joseph Tucek, Shan Lu, Chengdu Huang, Spiros Xanth...
Computer systems often fail due to many factors such as software bugs or administrator errors. Diagnosing such production run failures is an important but challenging task since i...
Ding Yuan, Haohui Mai, Weiwei Xiong, Lin Tan, Yuan...