Sciweavers

PKDD
1999
Springer

Handling Missing Data in Trees: Surrogate Splits or Statistical Imputation

13 years 8 months ago
Handling Missing Data in Trees: Surrogate Splits or Statistical Imputation
Abstract. In many applications of data mining a - sometimes considerable - part of the data values is missing. This may occur because the data values were simply never entered into the operational systems from which the mining table was constructed, or because for example simple domain checks indicate that entered values are incorrect. Despite the frequent occurrence of missing data, most data mining algorithms handle missing data in a rather ad-hoc way, or simply ignore the problem. We investigate simulation-based data augmentation to handle missing data, which is based on lling-in imputing one or more plausible values for the missing data. One advantage of this approach is that the imputation phase is separated from the analysis phase, allowing for di erent data mining algorithms to be applied to the completed data sets. We compare the use of imputation to surrogate splits, such as used in CART, to handle missing data in tree-based mining algorithms. Experiments show that imputatio...
A. J. Feelders
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Where PKDD
Authors A. J. Feelders
Comments (0)