Sciweavers

CIKM
2007
Springer

Spam filtering for short messages

13 years 10 months ago
Spam filtering for short messages
We consider the problem of content-based spam filtering for short text messages that arise in three contexts: mobile (SMS) communication, blog comments, and email summary information such as might be displayed by a lowbandwidth client. Short messages often consist of only a few words, and therefore present a challenge to traditional bag-of-words based spam filters. Using three corpora of short messages and message fields derived from real SMS, blog, and spam messages, we evaluate feature-based and compression-model-based spam filters. We observe that bagof-words filters can be improved substantially using different features, while compression-model filters perform quite well as-is. We conclude that content filtering for short messages is surprisingly effective. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Information filtering General Terms Experimentation, Measurement Keywords SMS, blog, spam, email. filtering, classification
Gordon V. Cormack, José María G&oacu
Added 07 Jun 2010
Updated 07 Jun 2010
Type Conference
Year 2007
Where CIKM
Authors Gordon V. Cormack, José María Gómez Hidalgo, Enrique Puertas Sanz
Comments (0)