Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

15 years 7 months ago

Download eric.univ-lyon2.fr

We investigate the performance of two machine learning algorithms in the context of antispam filtering. The increasing volume of unsolicited bulk e-mail (spam) has generated a need for reliable anti-spam filters. Filters of this type have so far been based mostly on keyword patterns that are constructed by hand and perform poorly. The Naive Bayesian classifier has recently been suggested as an effective method to construct automatically anti-spam filters with superior performance. We investigate thoroughly the performance of the Naive Bayesian filter on a publicly available corpus, contributing towards standard benchmarks. At the same time, we compare the performance of the Naive Bayesian filter to an alternative memorybased learning approach, after introducing suitable cost-sensitive evaluation measures. Both methods achieve very accurate spam filtering, outperforming clearly the keyword-based filter of a widely used e-mail reader.

Ion Androutsopoulos, Georgios Paliouras, Vangelis

Real-time Traffic

Anti-spam Filters | CORR 2000 | Education | Naive Bayesian Classifier | Naive Bayesian Filter |

claim paper

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2000
Where	CORR
Authors	Ion Androutsopoulos, Georgios Paliouras, Vangelis Karkaletsis, Georgios Sakkis, Constantine D. Spyropoulos, Panagiotis Stamatopoulos

Sciweavers

Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach

Anti-spam Filters | CORR 2000 | Education | Naive Bayesian Classifier | Naive Bayesian Filter |

Explore & Download

Productivity Tools

Sciweavers