ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

13 years 6 months ago

Download aclweb.org

The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English, Latvian, Lithuanian, and Romanian languages.

Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong

Real-time Traffic

ACL 2012 | Computational Linguistics | Parallel Corpora | Parallel Translation | Romanian Languages |

claim paper

» Creating a PersianEnglish Comparable Corpus

» An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

» Using Local Alignments for Relation Recognition

» MARS Multilingual Access and Retrieval System with Enhanced Query Translation and Document...

» Coupled temporal scoping of relational facts

Post Info
More Details (n/a)

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Marcis Pinnis, Radu Ion, Dan Stefanescu, Fangzhong Su, Inguna Skadina, Andrejs Vasiljevs, Bogdan Babych

Comments (0)

Sciweavers

ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

ACL 2012 | Computational Linguistics | Parallel Corpora | Parallel Translation | Romanian Languages |

Explore & Download

Productivity Tools

Sciweavers