Sciweavers

Share
AIRWEB
2009
Springer

Looking into the past to better classify web spam

9 years 4 months ago
Looking into the past to better classify web spam
Web spamming techniques aim to achieve undeserved rankings in search results. Research has been widely conducted on identifying such spam and neutralizing its influence. However, existing spam detection work only considers current information. We argue that historical web page information may also be important in spam classification. In this paper, we use content features from historical versions of web pages to improve spam classification. We use supervised learning techniques to combine classifiers based on current page content with classifiers based on temporal features. Experiments on the WEBSPAM-UK2007 dataset show that our approach improves spam classification F-measure performance by 30% compared to a baseline classifier which only considers current page content. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.3.5 [Information Storage and Retrieval]: Online Information Services—Web based services; I.7.5 ...
Na Dai, Brian D. Davison, Xiaoguang Qi
Added 25 May 2010
Updated 25 May 2010
Type Conference
Year 2009
Where AIRWEB
Authors Na Dai, Brian D. Davison, Xiaoguang Qi
Comments (0)
books