Similarity measures in many real applications generate indefinite similarity matrices. In this paper, we consider the problem of classification based on such indefinite similariti...
Semi-supervised learning (SSL), is classification where additional unlabeled data can be used to improve accuracy. Generative approaches are appealing in this situation, as a mode...
This paper explores online learning approaches for detecting malicious Web sites (those involved in criminal scams) using lexical and host-based features of the associated URLs. W...
Justin Ma, Lawrence K. Saul, Stefan Savage, Geoffr...
We present a new approach to multiple instance learning (MIL) that is particularly effective when the positive bags are sparse (i.e. contain few positive instances). Unlike other ...
Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector ...