Sciweavers

WWW
2009
ACM

Crawling English-Japanese person-name transliterations from the web

14 years 5 months ago
Crawling English-Japanese person-name transliterations from the web
Automatic compilation of lexicon is a dream of lexicon compilers as well as lexicon users. This paper proposes a system that crawls English-Japanese person-name transliterations from the Web, which works a back-end collector for automatic compilation of bilingual person-name lexicon. Our crawler collected 561K transliterations in five months. From them, an English-Japanese person-name lexicon with 406K entries has been compiled by an automatic post processing. This lexicon is much larger than other similar resources including English-Japanese lexicon of HeiNER obtained from Wikipedia. Categories and Subject Descriptors I.2.7 [Artificial Intelligence]: Natural Language Processing; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Design, Experimentation. Keywords Mining transliteration pairs, automatic lexicon compilation, person name.
Satoshi Sato
Added 21 Nov 2009
Updated 21 Nov 2009
Type Conference
Year 2009
Where WWW
Authors Satoshi Sato
Comments (0)