Sciweavers

BMCBI
2010

Class prediction for high-dimensional class-imbalanced data

13 years 4 months ago
Class prediction for high-dimensional class-imbalanced data
Background: The goal of class prediction studies is to develop rules to accurately predict the class membership of new samples. The rules are derived using the values of the variables available for each subject: the main characteristic of high-dimensional data is that the number of variables greatly exceeds the number of samples. Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of samples in each class is not equal. Standard classification methods used on class-imbalanced data often produce classifiers that do not accurately predict the minority class; the prediction is biased towards the majority class. In this paper we investigate if the high-dimensionality poses additional challenges when dealing with class-imbalanced prediction. We evaluate the performance of six types of classifiers on class-imbalanced data, using simulated data and a publicly available data set from a breast cancer gene-expression microarray study. We also in...
Rok Blagus, Lara Lusa
Added 08 Dec 2010
Updated 08 Dec 2010
Type Journal
Year 2010
Where BMCBI
Authors Rok Blagus, Lara Lusa
Comments (0)