In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
This paper focuses on spam blog (splog) detection. Blogs are highly popular, new media social communication mechanisms and splogs corrupt blog search results as well as waste netw...
Yu-Ru Lin, Hari Sundaram, Yun Chi, Jun'ichi Tatemu...
The early success of link-based ranking algorithms was predicated on the assumption that links imply merit of the target pages. However, today many links exist for purposes other ...
Botnets are large groups of compromised machines (bots) used by miscreants for the most illegal activities (e.g., sending spam emails, denial-of-service attacks, phishing and other...
Emanuele Passerini, Roberto Paleari, Lorenzo Marti...
The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexibl...