Sciweavers

COLING
2002

(Semi-)Automatic Detection of Errors in PoS-Tagged Corpora

13 years 4 months ago
(Semi-)Automatic Detection of Errors in PoS-Tagged Corpora
This paper presents a simple yet in practice very efficient technique serving for automatic detection of those positions in a partof-speech tagged corpus where an error is to be suspected. The approach is based on the idea of learning and later application of "negative bigrams", i.e. on the search for pairs of adjacent tags which constitute an incorrect configuration in a text of a particular language (in English, e.g., the bigram ARTICLE - FINITE VERB). Further, the paper describes the generalization of the "negative bigrams" into "negative n-grams", for any natural n, which indeed provides a powerful tool for error detection in a corpus. The implementation is also discussed, as well as evaluation of results of the approach when used for error detection in the NEGRA
Pavel Kveton, Karel Oliva
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Where COLING
Authors Pavel Kveton, Karel Oliva
Comments (0)