Sciweavers

WWW
2010
ACM

Identifying spam link generators for monitoring emerging web spam

13 years 11 months ago
Identifying spam link generators for monitoring emerging web spam
In this paper, we address the question of how we can identify hosts that will generate links to web spam. Detecting such spam link generators is important because almost all new spam links are created by them. By monitoring spam link generators, we can quickly find emerging web spam that can be used for updating existing spam filters. In order to classify spam link generators, we investigate various linkbased features including modified PageRank scores based on white and spam seeds, and these scores of neighboring hosts. An online learning algorithm is used to handle large scale data, and the effectiveness of various features is examined. Experiments on three yearly archives of Japanese Web show that we can predict spam link generators with a reasonable performance. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Experimentation, Measurement Keywords Link analysis, Web spam, Information retrieval
Young-joo Chung, Masashi Toyoda, Masaru Kitsuregaw
Added 18 May 2010
Updated 18 May 2010
Type Conference
Year 2010
Where WWW
Authors Young-joo Chung, Masashi Toyoda, Masaru Kitsuregawa
Comments (0)