The standard method for combating spam, either in email or on the web, is to train a classifier on manually labeled instances. As the spammers change their tactics, the performanc...
Deepak Chinavle, Pranam Kolari, Tim Oates, Tim Fin...
Federated text search provides a unified search interface for multiple search engines of distributed text information sources. Resource selection is an important component for fed...
In this paper, we present a framework for mining diverging patterns, a new type of contrast patterns whose frequency changes significantly differently in two data sets, e.g., it c...
Tourist photographs constitute a large part of the images uploaded to photo sharing platforms. But filtering methods are needed before one can extract useful knowledge from noisy ...
Adrian Popescu, Gregory Grefenstette, Pierre-Alain...
The problem of measuring similarity between web pages arises in many important Web applications, such as search engines and Web directories. In this paper, we propose a novel neig...