Filtering Email Spam in the Presence of Noisy User Feedback

10 years 8 months ago
Filtering Email Spam in the Presence of Noisy User Feedback
Recent email spam filtering evaluations, such as those conducted at TREC, have shown that near-perfect filtering results are attained with a variety of machine learning methods when filters are given perfectly accurate labeling feedback for training. Yet in realworld settings, labeling feedback may be far from perfect. Real users give feedback that is often mistaken, inconsistent, or even maliciously inaccurate. To our knowledge, the impact of this noisy labeling feedback on current spam filtering methods has not been previously explored in the literature. In this paper, we show that noisy feedback may harm or even break state-of-the-art spam filters, including recent TREC winners. We then propose and evaluate several approaches to make such filters robust to label noise. We find that although such modifications are effective for uniform random label noise, more realistic "natural" label noise from human users remains a difficult challenge.
D. Sculley, Gordon V. Cormack
Added 12 Oct 2010
Updated 12 Oct 2010
Type Conference
Year 2008
Where CEAS
Authors D. Sculley, Gordon V. Cormack
Comments (0)