Sciweavers

ICML
2006
IEEE

Feature subset selection bias for classification learning

14 years 5 months ago
Feature subset selection bias for classification learning
Feature selection is often applied to highdimensional data prior to classification learning. Using the same training dataset in both selection and learning can result in socalled feature subset selection bias. This bias putatively can exacerbate data overfitting and negatively affect classification performance. However, in current practice separate datasets are seldom employed for selection and learning, because dividing the training data into two datasets for feature selection and classifier learning respectively reduces the amount of data that can be used in either task. This work attempts to address this dilemma. We formalize selection bias for classification learning, analyze its statistical properties, and study factors that affect selection bias, as well as how the bias impacts classification learning via various experiments. This research endeavors to provide illustration and explanation why the bias may not cause negative impact in classification as much as expected in regress...
Surendra K. Singhi, Huan Liu
Added 17 Nov 2009
Updated 17 Nov 2009
Type Conference
Year 2006
Where ICML
Authors Surendra K. Singhi, Huan Liu
Comments (0)