We present our efforts to create a large-scale, semi-automatically annotated parallel corpus of cleft constructions. The corpus is intended to reduce or make more effective the ma...
A major obstacle to the construction of a probabilistic translation model is the lack of large parallel corpora. In this paper we first describe a parallel text mining system that...
This paper presents a new corpus project, aiming at building a national corpus of Polish. What makes it different from a typical YACP (Yet Another Corpus Project) is 1) the fact t...
Unit selection text-to-speech systems currently produce very natural synthesized phrases by concatenating speech segments from a large database. Recently, increasing demand for de...
We have aligned Japanese and English news articles and sentences to make a large parallel corpus. We first used a method based on cross-language information retrieval (CLIR) to a...