Sciweavers

JBCB
2010

Calbc Silver Standard Corpus

13 years 2 months ago
Calbc Silver Standard Corpus
The production of gold standard corpora is time-consuming and costly. We propose an alternative: the ‚silver standard corpus‗ (SSC), a corpus that has been generated by the harmonisation of the annotations that have been delivered from a selection of annotation systems. The systems have to share the type system for the annotations and the harmonisation solution has use a suitable similarity measure for the pair-wise comparison of the annotations. The annotation systems have been evaluated against the harmonised set (630.324 sentences, 15,956,841 tokens). We can demonstrate that the annotation of proteins and genes shows higher diversity across all used annotation solutions leading to a lower agreement against the harmonised set in comparison to the annotations of diseases and species. An analysis of the most frequent annotations from all systems shows that a high agreement amongst systems leads to the selection of terms that are suitable to be kept in the harmonised set. This is t...
Dietrich Rebholz-Schuhmann, Antonio Jimeno-Yepes,
Added 28 Jan 2011
Updated 28 Jan 2011
Type Journal
Year 2010
Where JBCB
Authors Dietrich Rebholz-Schuhmann, Antonio Jimeno-Yepes, Erik M. van Mulligen, Ning Kang, Jan A. Kors, David Milward, Peter Corbett, Ekaterina Buyko, Elena Beisswanger, Udo Hahn
Comments (0)