Sciweavers

COLING
2002

Detecting Errors in Corpora Using Support Vector Machines

13 years 4 months ago
Detecting Errors in Corpora Using Support Vector Machines
While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank. Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus. We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors.
Tetsuji Nakagawa, Yuji Matsumoto
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Where COLING
Authors Tetsuji Nakagawa, Yuji Matsumoto
Comments (0)