Sciweavers

SIGIR
2010
ACM

Crowdsourcing a wikipedia vandalism corpus

13 years 7 months ago
Crowdsourcing a wikipedia vandalism corpus
We report on the construction of the PAN Wikipedia vandalism corpus, PAN-WVC-10, using Amazon’s Mechanical Turk. The corpus compiles 32 452 edits on 28 468 Wikipedia articles, among which 2 391 vandalism edits have been identified. 753 human annotators cast a total of 193 022 votes on the edits, so that each edit was reviewed by at least 3 annotators, whereas the achieved level of agreement was analyzed in order to label an edit as “regular” or “vandalism.” The corpus is available free of charge.1 Categories and Subject Descriptors: H.3.4 [Information Storage and Retrieval]: Systems and Software—Performance Evaluation General Terms: Experimentation
Martin Potthast
Added 16 Aug 2010
Updated 16 Aug 2010
Type Conference
Year 2010
Where SIGIR
Authors Martin Potthast
Comments (0)