Abstract. Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem...
Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective c...
In this paper, we investigate the role of a biomedical dataset on the classification accuracy of an algorithm. We quantify the complexity of a biomedical dataset using five complex...
While most software defects (i.e., bugs) are corrected and tested as part of the lengthy software development cycle, enterprise software vendors often have to release software pro...
Charles X. Ling, Victor S. Sheng, Tilmann F. W. Br...
Many data mining applications have a large amount of data but labeling data is often difficult, expensive, or time consuming, as it requires human experts for annotation. Semi-supe...