Sciweavers

CLEF
2011
Springer

Detecting Wikipedia Vandalism using Machine Learning - Notebook for PAN at CLEF 2011

12 years 4 months ago
Detecting Wikipedia Vandalism using Machine Learning - Notebook for PAN at CLEF 2011
Wikipedia vandalism identification is a very complex issue, which is now mostly solved manually by volunteers. This paper presents the main components of a system built by our group in order to automatically identify vandalized Wikipedia articles. The main component of our system is a machine learning component that uses three types of features grouped in 3 classes: Metadata, Text and Language. Additional to previous approaches we consider 4 new features related to vulgar, biased, sexual and miscellaneous bad words. The obtained results showed an area of 0.42464 under the PR-AUC curve and an area of 0.82963 under the ROC-AUC curve.
Cristian-Alexandru Dragusanu, Marina Cufliuc, Adri
Added 18 Dec 2011
Updated 18 Dec 2011
Type Journal
Year 2011
Where CLEF
Authors Cristian-Alexandru Dragusanu, Marina Cufliuc, Adrian Iftene
Comments (0)