Learning to classify e-mail

9 years 3 months ago
Learning to classify e-mail
In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naı¨ve Bayes. We introduce a new accurate feature selector with linear time complexity. Secondly, we examine the applicability of the semi-supervised co-training paradigm for spam e-mail filtering by employing random forests, support vector machines, decision tree and naı¨ve Bayes as base classifiers. The study shows that a classifier trained on a small set of labelled examples can be successfully boosted using unlabelled examples ...
Irena Koprinska, Josiah Poon, James Clark, Jason C
Added 15 Dec 2010
Updated 15 Dec 2010
Type Journal
Year 2007
Where ISCI
Authors Irena Koprinska, Josiah Poon, James Clark, Jason Chan
Comments (0)