Duplicate detection in click streams

11 years 4 months ago
Duplicate detection in click streams
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows. We run a comprehensive set of experiments, using both real and synthetic click streams, to evaluate the performance of the proposed solution. The results demonstrate that the proposed solution yields extremely low error rates. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications-Data Mining; C.2.3 [Computer-Communication Networks]: Network Operations--Network Monitoring General Terms Algorithms, Security, Performance Keywords Data Streams, Advertising Networks, Duplicate Detection, Approximate Queries
Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2005
Where WWW
Authors Ahmed Metwally, Divyakant Agrawal, Amr El Abbadi
Comments (0)