An optimized approach for KNN text categorization using P-trees

13 years 9 months ago
An optimized approach for KNN text categorization using P-trees
The importance of text mining stems from the availability of huge volumes of text databases holding a wealth of valuable information that needs to be mined. Text categorization is the process of assigning categories or labels to documents based entirely on their contents. Formally, it can be viewed as a mapping from the document space into a set of predefined class labels (aka subjects or categories); F: DÆ{C1, C2…Cn} where F is the mapping function, D is the document space and {C1, C2…Cn} is the set of class labels. Given an unlabeled document d, we need to find its class label, Ci, using the mapping function F where F(d) = Ci. In this paper, an optimized k-Nearest Neighbors (KNN) classifier that uses intervalization and the P-tree1 technology to achieve a high degree of accuracy, space utilization and time efficiency is proposed: As new samples arrive, the classifier finds the k nearest neighbors to the new sample from the training space without a single database scan. Categori...
Imad Rahal, William Perrizo
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where SAC
Authors Imad Rahal, William Perrizo
Comments (0)