Automatically finding email messages that contain requests for action can provide valuable assistance to users who otherwise struggle to give appropriate attention to the actionab...
Multinomial distributions are often used to model text documents. However, they do not capture well the phenomenon that words in a document tend to appear in bursts: if a word app...
Rasmus Elsborg Madsen, David Kauchak, Charles Elka...
By far, the support vector machines (SVM) achieve the state-of-theart performance for the text classification (TC) tasks. Due to the complexity of the TC problems, it becomes a ch...
We propose and test an objective criterion for evaluation of clustering performance: How well does a clustering algorithm run on unlabeled data aid a classification algorithm? The...
Spam e-mail with advertisement text embedded in images presents a great challenge to anti-spam filters. In this paper, we present a fast method to detect image-based spam e-mail. U...