Exploring linguistic features for web spam detection: a preliminary study

15 years 2 months ago

Download airweb.cse.lehigh.edu

We study the usability of linguistic features in the Web spam classification task. The features were computed on two Web spam corpora: Webspam-Uk2006 and Webspam-Uk2007, we make them publicly available for other researchers. Preliminary analysis seems to indicate that certain linguistic features may be useful for the spam-detection task when combined with features studied elsewhere. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content Analysis and Indexing--Linguistic processing; I.2.6 [Artificial Intelligence]: Learning General Terms Web spam Keywords Web spam detection, content features, linguistic features

Jakub Piskorski, Marcin Sydow, Dawid Weiss

Real-time Traffic