Time-efficient spam e-mail filtering using n-gram models

11 years 1 months ago
Time-efficient spam e-mail filtering using n-gram models
In this paper, we propose spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. We develop two models, a class general model and an e-mail specific model, and test the methods under these models. The models are then combined in such a way that the latter one is activated for the cases the first model falls short. Though the approach proposed and the methods developed are general and can be applied to any language, we mainly apply them to Turkish, which is an agglutinative language, and examine some properties of the language. Extensive tests were performed and success rates about 98% for Turkish and 99% for English were obtained. It has been shown that the time complexities can be reduced significantly without sacrificing performance.
Ali Çiltik, Tunga Güngör
Added 14 Dec 2010
Updated 14 Dec 2010
Type Journal
Year 2008
Where PRL
Authors Ali Çiltik, Tunga Güngör
Comments (0)