Sciweavers

LREC
2010

Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus

13 years 6 months ago
Alignment-based Profiling of Europarl Data in an English-Swedish Parallel Corpus
This paper profiles the Europarl part of an English-Swedish parallel corpus and compares it with three other subcorpora of the same parallel corpus. We first describe our method for comparison which is based on alignments, both at the token level and the structural level. Although two of the other subcorpora contains fiction, it is found that the Europarl part is the one having the highest proportion of many types of restructurings, including additions, deletions and long distance reorderings. We explain this by the fact that the majority of Europarl segments are parallel translations.
Lars Ahrenberg
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Lars Ahrenberg
Comments (0)