Sciweavers

WWW
2008
ACM

Improving web spam detection with re-extracted features

14 years 5 months ago
Improving web spam detection with re-extracted features
Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to optimize the detection result. Based on the predicted spamicity obtained by the preliminary detection, through the host level web graph, three types of features are extracted. Experiments on WEBSPAMUK2006 benchmark show that with this strategy, the performance of web spam detection can be improved evidently. Categories and Subject Descriptors H.5.4 [Information Interfaces and Presentation]: Hypertext/ Hypermedia; K.4.m [Computer and Society]: Miscellaneous; H.4.m [Information Systems Applications]: Miscellaneous General Terms Measurement, Experimentation, Algorithms. Keywords Link spam, Content spam, Web spam, Machine learning.
Guanggang Geng, Chunheng Wang, Qiudan Li
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2008
Where WWW
Authors Guanggang Geng, Chunheng Wang, Qiudan Li
Comments (0)