Sciweavers

COLING
2010

EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora

12 years 11 months ago
EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
In this paper, we present an unsupervised hybrid model which combines statistical, lexical, linguistic, contextual, and temporal features in a generic EMbased framework to harvest bilingual terminology from comparable corpora through comparable document alignment constraint. The model is configurable for any language and is extensible for additional features. In overall, it produces considerable improvement in performance over the baseline method. On top of that, our model has shown promising capability to discover new bilingual terminology with limited usage of dictionaries.
Lianhau Lee, AiTi Aw, Min Zhang, Haizhou Li
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Lianhau Lee, AiTi Aw, Min Zhang, Haizhou Li
Comments (0)