Sciweavers

CSDA
2004

A note on split selection bias in classification trees

13 years 4 months ago
A note on split selection bias in classification trees
A common approach to split selection in classification trees is to search through all possible splits generated by predictor variables. A splitting criterion is then used to evaluate those splits and the one with the largest criterion value is usually chosen to actually channel samples into corresponding subnodes. However, this greedy method is biased in variable selection when the numbers of the available split points for each variable are different. Such result may thus hamper the intuitively appealing nature of classification trees. The problem of the split selection bias for two-class tasks with numerical predictors is examined. The statistical explanation of its existence is given and a solution based on the P-values is provided, when the Pearson chisquare statistic is used as the splitting criterion. keyword Cram
Y.-S. Shih
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2004
Where CSDA
Authors Y.-S. Shih
Comments (0)