Automatic Text Categorization by Unsupervised Learning

13 years 9 months ago
Automatic Text Categorization by Unsupervised Learning
The goal of text categorization is to classify documents into a certain number of pre-defined categories. The previous works in this area have used a large number of labeled training documents for supervised learning. One problem is that it is difficult to create the labeled training documents. While it is easy to collect the unlabeled documents, it is not so easy to manually categorize them for creating training documents. In this paper, we propose an unsupervised learning method to overcome these difficulties. The proposed method divides the documents into sentences, and categorizes each sentence using keyword lists of each category and sentence similarity measure. And then, it uses the categorized sentences for training. The proposed method shows a similar degree of performance, compared with the traditional supervised learning methods. Therefore, this method can be used in areas where low-cost text categorization is needed. It also can be used for creating training documents.
Youngjoong Ko, Jungyun Seo
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 2000
Authors Youngjoong Ko, Jungyun Seo
Comments (0)