Sciweavers

AAAI
2006

Proposing a New Term Weighting Scheme for Text Categorization

13 years 7 months ago
Proposing a New Term Weighting Scheme for Text Categorization
In text categorization, term weighting methods assign appropriate weights to the terms to improve the classification performance. In this study, we propose an effective term weighting scheme, i.e. tf.rf, and investigate several widely-used unsupervised and supervised term weighting methods on two popular data collections in combination with SVM and kNN algorithms. From our controlled experimental results, not all supervised term weighting methods have a consistent superiority over unsupervised term weighting methods. Specifically, the three supervised methods based on the information theory, i.e. tf.2 , tf.ig and tf.or, perform rather poorly in all experiments. On the other hand, our proposed tf.rf achieves the best performance consistently and outperforms other methods substantially and significantly. The popularly-used tf.idf method has not shown a uniformly good performance with respect to different data corpora.
Man Lan, Chew Lim Tan, Hwee-Boon Low
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where AAAI
Authors Man Lan, Chew Lim Tan, Hwee-Boon Low
Comments (0)