Sciweavers

CEAS
2006
Springer

Learning at Low False Positive Rates

13 years 8 months ago
Learning at Low False Positive Rates
Most spam filters are configured for use at a very low falsepositive rate. Typically, the filters are trained with techniques that optimize accuracy or entropy, rather than performance in this configuration. We describe two different techniques for optimizing for the low false-positive region. One method weights good data more than spam. The other method uses a two-stage technique of first finding data in the low false-positive region, and then learning using this subset. We show that with two different learning algorithms, logistic regression and Naive Bayes, we achieve substantial improvements, reducing missed spam by as much as 20% relative for logistic regression and 40% for Naive Bayes at the same low false-positive rate.
Wen-tau Yih, Joshua Goodman, Geoff Hulten
Added 20 Aug 2010
Updated 20 Aug 2010
Type Conference
Year 2006
Where CEAS
Authors Wen-tau Yih, Joshua Goodman, Geoff Hulten
Comments (0)