Detecting Wikipedia Vandalism using Machine Learning - Notebook for PAN at CLEF 2011

14 years 4 months ago

Download www.uni-weimar.de

Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by our group in order to automatically identify vandalized Wikipedia articles. The main component of our system is a machine learning component that uses three types of features grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches we consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.

Cristian-Alexandru Dragusanu, Marina Cufliuc, Adri

Real-time Traffic

Auc | CLEF 2011 | Information Technology | Roc | Vandalism |

claim paper

» Wiki Vandalysis Wikipedia Vandalism Analysis Lab Report for PAN at CLEF 2010

» Overview of the 2nd International Competition on Wikipedia Vandalism Detection

» Automatic Vandalism Detection in Wikipedia

Post Info
More Details (n/a)

Added	18 Dec 2011
Updated	18 Dec 2011
Type	Journal
Year	2011
Where	CLEF
Authors	Cristian-Alexandru Dragusanu, Marina Cufliuc, Adrian Iftene

Comments (0)

Sciweavers

Detecting Wikipedia Vandalism using Machine Learning - Notebook for PAN at CLEF 2011

Auc | CLEF 2011 | Information Technology | Roc | Vandalism |

Explore & Download

Productivity Tools

Sciweavers