Sciweavers

CEAS
2006
Springer

Spam Filtering with Naive Bayes - Which Naive Bayes?

13 years 8 months ago
Spam Filtering with Naive Bayes - Which Naive Bayes?
Naive Bayes is very popular in commercial and open-source anti-spam e-mail filters. There are, however, several forms of Naive Bayes, something the anti-spam literature does not always acknowledge. We discuss five different versions of Naive Bayes, and compare them on six new, non-encoded datasets, that contain ham messages of particular Enron users and fresh spam messages. The new datasets, which we make publicly available, are more realistic than previous comparable benchmarks, because they maintain the temporal order of the messages in the two categories, and they emulate the varying proportion of spam and ham messages that users receive over time. We adopt an experimental procedure that emulates the incremental training of personalized spam filters, and we plot roc curves that allow us to compare the different versions of nb over the entire tradeoff between true positives and true negatives.
Vangelis Metsis, Ion Androutsopoulos, Georgios Pal
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CEAS
Authors Vangelis Metsis, Ion Androutsopoulos, Georgios Paliouras
Comments (0)