Sciweavers

71 search results - page 10 / 15
» Segmentation and alignment of parallel text for statistical ...
Sort
View
ICML
1998
IEEE
14 years 7 months ago
Learning a Language-Independent Representation for Terms from a Partially Aligned Corpus
Cross-language latent semantic indexing is a method that learns useful languageindependent vector representations of terms through a statistical analysis of a documentaligned text...
Michael L. Littman, Fan Jiang, Greg A. Keim
CIARP
2009
Springer
13 years 4 months ago
Incorporating Linguistic Information to Statistical Word-Level Alignment
Abstract. Parallel texts are enriched by alignment algorithms, thus establishing a relationship between the structures of the implied languages. Depending on the alignment level, t...
Eduardo Cendejas, Grettel Barceló, Alexande...
COLING
2010
13 years 1 months ago
An Empirical Study on Web Mining of Parallel Data
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
Gum-Won Hong, Chi-Ho Li, Ming Zhou, Hae-Chang Rim
ACL
1998
13 years 7 months ago
Bitext Correspondences through Rich Mark-up
Rich mark-up can considerably benefit the process of establishing bitext correspondences, that is, the task of providing correct identification and alignment methods for text segm...
Raquel Martínez, Joseba Abaitua, Arantza Ca...
EMNLP
2008
13 years 7 months ago
Language and Translation Model Adaptation using Comparable Corpora
Traditionally, statistical machine translation systems have relied on parallel bi-lingual data to train a translation model. While bi-lingual parallel data are expensive to genera...
Matthew G. Snover, Bonnie J. Dorr, Richard M. Schw...