Applying Monte Carlo Techniques to Language Identification

10 years 7 months ago
Applying Monte Carlo Techniques to Language Identification
Two major stages stages in language identification systems can be identified: the language modeling stage, where the distinctive features of languages are determined and stored in models, and the classification stage, in which the model of the (partial) input document is compared to the reference language models. The language model most similar to the input document represents the language of the document. We describe the best-known modeling and classification techniques known in literature, and identify one disadvantage in them: the need to create a model of the entire document, even though the language can be identified with a small number of features. To avoid this, we introduce a new language identification technique that is based on Monte Carlo sampling. We show that, by determining the language of a large enough number of random features, we can determine the document language to be the language which result most often from these features. Whether the amount of samples is suffic...
Arjen Poutsma
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where CLIN
Authors Arjen Poutsma
Comments (0)