Blocking Blog Spam with Language Model Disagreement

15 years 10 months ago

Download airweb.cse.lehigh.edu

We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In contrast to other link spam ﬁltering approaches, our method requires no training, no hard-coded rule sets, and no knowledge of complete-web connectivity. Preliminary experiments with identiﬁcation of typical blog spam show promising results. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval - search engine spam; I.7.5 [Document Capture]: Document analysis - document classiﬁcation, spam ﬁltering; K.4.1 [Computers and Society]: Public Policy Issues - abuse and crime involving computers, privacy General Terms Algorithms, Languages, Legal Aspects Keywords Comment spam, language models, blogs

Gilad Mishne, David Carmel, Ronny Lempel

Real-time Traffic

AIRWEB 2005 | Internet Technology | Link Spam | Spam ﬁltering Approaches | Typical Blog Spam |

claim paper

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	AIRWEB
Authors	Gilad Mishne, David Carmel, Ronny Lempel

Sciweavers

Blocking Blog Spam with Language Model Disagreement

AIRWEB 2005 | Internet Technology | Link Spam | Spam ﬁltering Approaches | Typical Blog Spam |

Explore & Download

Productivity Tools

Sciweavers