Background: Data generated using `omics' technologies are characterized by high dimensionality, where the number of features measured per subject vastly exceeds the number of...
Yu Guo, Armin Graber, Robert N. McBurney, Raji Bal...
Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality clas...
Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hu...
Learning from imbalanced datasets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively...
Nitesh V. Chawla, David A. Cieslak, Lawrence O. Ha...
Rescaling is possibly the most popular approach to cost-sensitive learning. This approach works by rescaling the classes according to their costs, and it can be realized in differ...
Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that e...
Arthur Zimek, Fabian Buchwald, Eibe Frank, Stefan ...