Sciweavers

COLING
1996

Identifying the Coding System and Language of On-line Documents on the Internet

13 years 5 months ago
Identifying the Coding System and Language of On-line Documents on the Internet
This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algorithm uses statistic language models to select the correctly decoded string as well as to determine the language. The proposed algorithm covers 9 languages and 11 coding systems used in Eastern Asia and Western Europe. Experimental results show that the level of accuracy of our algorithm is over 95% for 640 on-line documents.
Gen-itiro Kikui
Added 02 Nov 2010
Updated 02 Nov 2010
Type Conference
Year 1996
Where COLING
Authors Gen-itiro Kikui
Comments (0)