Sciweavers

CORR
2004
Springer

"In vivo" spam filtering: A challenge problem for data mining

13 years 4 months ago
"In vivo" spam filtering: A challenge problem for data mining
Spam, also known as Unsolicited Commercial Email (UCE), is the bane of email communication. Many data mining researchers have addressed the problem of detecting spam, generally by treating it as a static text classification problem. True in vivo spam filtering has characteristics that make it a rich and challenging domain for data mining. Indeed, real-world datasets with these characteristics are typically difficult to acquire and to share. This paper demonstrates some of these characteristics and argues that researchers should pursue in vivo spam filtering as an accessible domain for investigating them. General Terms spam, text classification, challenge problems, class skew, imbalanced data, cost-sensitive learning, data streams, concept drift
Tom Fawcett
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2004
Where CORR
Authors Tom Fawcett
Comments (0)