Sciweavers

WWW
2005
ACM

A comprehensive comparative study on term weighting schemes for text categorization with support vector machines

14 years 5 months ago
A comprehensive comparative study on term weighting schemes for text categorization with support vector machines
Term weighting scheme, which has been used to convert the documents as vectors in the term space, is a vital step in automatic text categorization. In this paper, we conducted comprehensive experiments to compare various term weighting schemes with SVM on two widely-used benchmark data sets. We also presented a new term weighting scheme tf.rf to improve the term's discriminating power. The controlled experimental results showed that this newly proposed tf.rf scheme is significantly better than other widely-used term weighting schemes. Compared with schemes related with tf factor alone, the idf factor does not improve or even decrease the term's discriminating power for text categorization. Categories and Subject Descriptors I.7 [Document and Text Processing]: Document Preparation General Terms Performance Keywords term weighting schemes, text categorization, SVM
Man Lan, Chew Lim Tan, Hwee-Boon Low, Sam Yuan Sun
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2005
Where WWW
Authors Man Lan, Chew Lim Tan, Hwee-Boon Low, Sam Yuan Sung
Comments (0)