Sciweavers

LREC
2008

Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop

13 years 5 months ago
Producing a Test Collection for Patent Machine Translation in the Seventh NTCIR Workshop
In aiming at research and development on machine translation, we produced a test collection for Japanese-English machine translation in the seventh NTCIR Workshop. This paper describes details of our test collection. From patent documents published in Japan and the United States, we extracted patent families as a parallel corpus. A patent family is a set of patent documents for the same or related invention and these documents are usually filed to more than one country in different languages. In the parallel corpus, we aligned Japanese sentences with their counterpart English sentences. Our test collection, which includes approximately 2 000 000 sentence pairs, can be used to train and test machine translation systems. Our test collection also includes search topics for cross-lingual patent retrieval and the contribution of machine translation to a patent retrieval task can also be evaluated. Our test collection will be available to the public for research purposes after the NTCIR fin...
Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Take
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Atsushi Fujii, Masao Utiyama, Mikio Yamamoto, Takehito Utsuro
Comments (0)