In this paper we propose a new parallel clustering algorithm based on the incremental construction of the compact sets of a collection of objects. This parallel algorithm is portab...
In lots of natural language processing tasks, the classes to be dealt with often occur heavily imbalanced in the underlying data set and classifiers trained on such skewed data t...
Many parallel join algorithms have been proposed in the last several years. However, most of these algorithms require that the amount of data to be joined is known in advance in o...
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query through...
This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. W...
Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffr...