Sciweavers

ERCIMDL
2006
Springer

Comparing and Combining Two Approaches to Automated Subject Classification of Text

13 years 8 months ago
Comparing and Combining Two Approaches to Automated Subject Classification of Text
A machine-learning and a string-matching approach to automated subject classification of text were compared, as to their performance, advantages and downsides. The former approach was based on an SVM algorithm, while the latter comprised string-matching between a controlled vocabulary and words in the text to be classified. Data collection consisted of a subset from Compendex, classified into six different classes. It was shown that SVM on average outperforms the string-matching approach: our hypothesis that SVM yields better recall and string-matching better precision was confirmed only on one of the classes. The two approaches being complementary, we investigated different combinations of the two based on combining their vocabularies. The results have shown that the original approaches, i.e. machine-learning approach without using background knowledge from the controlled vocabulary, and string-matching approach based on controlled vocabulary, outperform approaches in which combinatio...
Koraljka Golub, Anders Ardö, Dunja Mladenic,
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where ERCIMDL
Authors Koraljka Golub, Anders Ardö, Dunja Mladenic, Marko Grobelnik
Comments (0)