Sciweavers

DMIN
2007

Generative Oversampling for Mining Imbalanced Datasets

13 years 5 months ago
Generative Oversampling for Mining Imbalanced Datasets
— One way to handle data mining problems where class prior probabilities and/or misclassification costs between classes are highly unequal is to resample the data until a new, desired class distribution in the training data is achieved. Many resampling techniques have been proposed in the past, and the relationship between resampling and cost-sensitive learning has been well studied. Surprisingly, however, few resampling techniques attempt to create new, artificial data points which generalize the known, labeled data. In this paper, we introduce an easily implementable resampling technique (generative oversampling) which creates new data points by learning from available training data. Empirically, we demonstrate that generative oversampling outperforms other wellknown resampling methods on several datasets in the example domain of text classification.
Alexander Liu, Joydeep Ghosh, Cheryl Martin
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where DMIN
Authors Alexander Liu, Joydeep Ghosh, Cheryl Martin
Comments (0)