Sciweavers

ISSTA
2010
ACM

Learning from 6, 000 projects: lightweight cross-project anomaly detection

13 years 8 months ago
Learning from 6, 000 projects: lightweight cross-project anomaly detection
Real production code contains lots of knowledge—on the domain, on the architecture, and on the environment. How can we leverage this knowledge in new projects? Using a novel lightweight source code parser, we have mined more than 6,000 open source Linux projects (totaling 200,000,000 lines of code) to obtain 16,000,000 temporal properties reflecting normal interface usage. New projects can be checked against these rules to detect anomalies—that is, code that deviates from the wisdom of the crowds. In a sample of 20 projects, ∼25% of the top-ranked anomalies uncovered actual code smells or defects. Categories and Subject Descriptors D.2.1 [Software]: Software Engineering—Software/Program Verification, Requirements/Specifications; D.3.4 [Software]: Programming Languages—Processors General Terms Design, Experimentation, Languages, Verification Keywords lightweight parsing, language independent parsing, mining specifications, temporal properties, formal concept analysis
Natalie Gruska, Andrzej Wasylkowski, Andreas Zelle
Added 15 Aug 2010
Updated 15 Aug 2010
Type Conference
Year 2010
Where ISSTA
Authors Natalie Gruska, Andrzej Wasylkowski, Andreas Zeller
Comments (0)