Robust Decision Trees: Removing Outliers from Databases

9 years 10 months ago
Robust Decision Trees: Removing Outliers from Databases
Finding and removingoutliers is an important problem in data mining. Errors in large databases can be extremely common,so an important property of a data mining algorithm is robustness with respect to errors in the database. Mostsophisticated methods in machinelearning address this problemto someextent, but not fully, andcan be improvedby addressing the problemmoredirectly. In this paper weexamine C4.5, a decision tree algorithm that is already quite robust - few algorithms have been shownto consistently achieve higher accuracy. C4.5 incorporates a pruning schemethat partially addresses the outfier removal problem. In our ROBUST-C4.5algorithm we extend the pruning methodto fully removethe effect of outliers, and this results in improvementon many databases.
George H. John
Added 26 Aug 2010
Updated 26 Aug 2010
Type Conference
Year 1995
Where KDD
Authors George H. John
Comments (0)