Sciweavers

ICMLC
2010
Springer

A comparative study on two large-scale hierarchical text classification tasks' solutions

13 years 8 months ago
A comparative study on two large-scale hierarchical text classification tasks' solutions
: Patent classification is a large scale hierarchical text classification (LSHTC) task. Though comprehensive comparisons, either learning algorithms or feature selection strategies, have been fully made in the text categorization field, few work was done for a LSHTC task due to high computational cost and complicated structural label characteristics. For the first time, this paper compares two popular learning frameworks, namely hierarchical support vector machine (SVM) and k nearest neighbor (k-NN) that are applied to a LSHTC task. Experiment results show that the latter outperforms the former in this LSHTC task, which is quite different from the usual results for normal text categorization tasks. Then this paper does a comparative study on different similarity measures and ranking approaches in k-NN framework for LSHTC task. Conclusions can be drawn that k-NN is more appropriate for the LSHTC task than hierarchical SVM and for a specific LSHTC task. BM25 outperforms other similarity ...
Jian Zhang, Hai Zhao, Bao-Liang Lu
Added 12 Feb 2011
Updated 12 Feb 2011
Type Journal
Year 2010
Where ICMLC
Authors Jian Zhang, Hai Zhao, Bao-Liang Lu
Comments (0)