Sciweavers

SIGIR
2012
ACM

Predicting quality flaws in user-generated content: the case of wikipedia

11 years 6 months ago
Predicting quality flaws in user-generated content: the case of wikipedia
The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classification as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality flaws, this way providing specific indications in which respects low-quality content needs improvement. The prediction is based on user-defined cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass clas...
Maik Anderka, Benno Stein, Nedim Lipka
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where SIGIR
Authors Maik Anderka, Benno Stein, Nedim Lipka
Comments (0)