Sciweavers

EMNLP
2008
13 years 6 months ago
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Lei Shi, Ming Zhou
LREC
2010
172views Education» more  LREC 2010»
13 years 6 months ago
Evaluating Utility of Data Sources in a Large Parallel Czech-English Corpus CzEng 0.9
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...
Ondrej Bojar, Adam Liska, Zdenek Zabokrtský
IAT
2007
IEEE
13 years 11 months ago
An Intelligent Web Agent to Mine Bilingual Parallel Pages via Automatic Discovery of URL Pairing Patterns
This paper describes an intelligent agent to facilitate bitext mining from the Web via automatic discovery of URL pairing patterns (or keys) for retrieving parallel web pages. The...
Chunyu Kit, Jessica Yee Ha Ng