Fair and balanced?: bias in bug-fix datasets

14 years 5 months ago

Download macbeth.cs.ucdavis.edu

Software engineering researchers have long been interested in where and why bugs occur in code, and in predicting where they might turn up next. Historical bug-occurence data has been key to this research. Bug tracking systems, and code version histories, record when, how and by whom bugs were fixed; from these sources, datasets that relate file changes to bug fixes can be extracted. These historical datasets can be used to test hypotheses concerning processes of bug introduction, and also to build statistical bug prediction models. Unfortunately, processes and humans are imperfect, and only a fraction of bug fixes are actually labelled in source code version histories, and thus become available for study in the extracted datasets. The question naturally arises, are the bug fixes recorded in these historical datasets a fair representation of the full population of bug fixes? In this paper, we investigate historical data from several software projects, and find strong evidence of syste...

Christian Bird, Adrian Bachmann, Eirik Aune, John

Real-time Traffic

Bug Fixes | Bug Prediction Models | Historical Datasets | SIGSOFT 2009 | Software Engineering |

claim paper

Post Info
More Details (n/a)

Added	19 Nov 2009
Updated	19 Nov 2009
Type	Conference
Year	2009
Where	SIGSOFT
Authors	Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, Premkumar T. Devanbu

Comments (0)

Sciweavers

Fair and balanced?: bias in bug-fix datasets

Bug Fixes | Bug Prediction Models | Historical Datasets | SIGSOFT 2009 | Software Engineering |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers