Learning k-Nearest Neighbor Naive Bayes for Ranking

15 years 6 months ago

Download www.cs.unb.ca

Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an eﬀective classiﬁcation model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At ﬁrst, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes (simply NB). The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using cloning to expand the training data. That is, each of the k nearest neighbors...

Liangxiao Jiang, Harry Zhang, Jiang Su

Real-time Traffic