Sciweavers

COLING
2008

An Improved Hierarchical Bayesian Model of Language for Document Classification

13 years 5 months ago
An Improved Hierarchical Bayesian Model of Language for Document Classification
This paper addresses the fundamental problem of document classification, and we focus attention on classification problems where the classes are mutually exclusive. In the course of the paper we advocate an approximate sampling distribution for word counts in documents, and demonstrate the model's capacity to outperform both the simple multinomial and more recently proposed extensions on the classification task. We also compare the classifiers to a linear SVM, and show that provided certain conditions are met, the new model allows performance which exceeds that of the SVM and attains amongst the very best published results on the Newsgroups classification task.
Ben Allison
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where COLING
Authors Ben Allison
Comments (0)