Sciweavers

ACL
1998

Bitext Correspondences through Rich Mark-up

13 years 5 months ago
Bitext Correspondences through Rich Mark-up
Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and alignment methods for text segments that are translation equivalences of each other in a parallel corpus. We present a sentence alignment algorithm that, by taking advantage of previously annotated texts, obtains accuracy rates close to 100%. The algorithm evaluates the similarity of the linguistic and extralinguistic mark-up in both sides of a bitext. Given that annotations are neutral with respect to typological, grammatical and orthographical differences between languages, rich mark-up becomes an optimal foundation to support bitext correspondences. The main originality of this approach is that it makes maximal use of annotations, which is a very sensible and efficient method for the exploitation of parallel corpora when annotations exist.
Raquel Martínez, Joseba Abaitua, Arantza Ca
Added 01 Nov 2010
Updated 01 Nov 2010
Type Conference
Year 1998
Where ACL
Authors Raquel Martínez, Joseba Abaitua, Arantza Casillas
Comments (0)