Boosting support vector machines for imbalanced data sets

15 years 3 months ago

Download www.site.uottawa.ca

Real world data mining applications must address the issue of learning from imbalanced data sets. The problem occurs when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classiﬁer to be built due to skewed vector spaces or lack of information. Common approaches for dealing with the class imbalance problem involve modifying the data distribution or modifying the classiﬁer. In this work, we choose to use a combination of both approaches. We use support vector machines with soft margins as the base classiﬁer to solve the skewed vector spaces problem. Then we use a boosting algorithm to get an ensemble classiﬁer that has lower error than a single classiﬁer. We found that this ensemble of SVMs makes an impressive improvement in prediction performance, not only for the majority class, but also for the minority class.

Benjamin X. Wang, Nathalie Japkowicz

Real-time Traffic