Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the fa...
Panagiotis Papadimitriou 0002, Ali Dasdan, Hector ...
Phishing is a significant problem involving fraudulent email and web sites that trick unsuspecting users into revealing private information. In this paper, we present the design, ...
Jason I. Hong, Lorrie Faith Cranor, Yue Zhang 0002
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solu...
In this work we focus on the problem of frequent itemset mining on large, out-of-core data sets. After presenting a characterization of existing out-of-core frequent itemset minin...
We introduce a new EM framework in which it is possible not only to optimize the model parameters but also the number of model components. A key feature of our approach is that we...