Identifying featured articles in wikipedia: writing style matters

12 years 9 months ago
Identifying featured articles in wikipedia: writing style matters
Wikipedia provides an information quality assessment model with criteria for human peer reviewers to identify featured articles. For this classification task “Is an article featured or not?” we present a machine learning approach that exploits an article’s character trigram distribution. Our approach differs from existing research in that it aims to writing style rather than evaluating meta features like the edit history. The approach is robust, straightforward to implement, and outperforms existing solutions. We underpin these claims by an experiment design where, among others, the domain transferability is analyzed. The achieved performances in terms of the F-measure for featured articles are 0.964 within a single Wikipedia domain and 0.880 in a domain transfer situation. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval; H.5.3 [Information Interfaces]: Group and Organization Interfaces General Terms: Algorithms, E...
Nedim Lipka, Benno Stein
Added 14 May 2010
Updated 14 May 2010
Type Conference
Year 2010
Where WWW
Authors Nedim Lipka, Benno Stein
Comments (0)