Sciweavers

Share
JCDL
2010
ACM

Transferring structural markup across translations using multilingual alignment and projection

11 years 10 months ago
Transferring structural markup across translations using multilingual alignment and projection
We present here a method for automatically projecting structural information across translations, including canonical citation structure (such as chapters and sections), speaker information, quotations, markup for people and places, and any other element in TEI-compliant XML that delimits spans of text that are linguistically symmetrical in two languages. We evaluate this technique on two datasets, one containing perfectly transcribed texts and one containing errorful OCR, and achieve an accuracy rate of 88.2% projecting 13,023 XML tags from source documents to their transcribed translations, with an 83.6% accuracy rate when projecting to texts containing uncorrected OCR. This approach has the potential to allow a highly granular multilingual digital library to be bootstrapped by applying the knowledge contained in a small, heavily curated collection to a much larger but unstructured one. Categories and Subject Descriptors H.3.7 [Information Systems: Information Storage and Retrieval]...
David Bamman, Alison Babeu, Gregory Crane
Added 10 Jul 2010
Updated 10 Jul 2010
Type Conference
Year 2010
Where JCDL
Authors David Bamman, Alison Babeu, Gregory Crane
Comments (0)
books