Sciweavers

EMNLP
2010

Word-Based Dialect Identification with Georeferenced Rules

13 years 8 months ago
Word-Based Dialect Identification with Georeferenced Rules
We present a novel approach for (written) dialect identification based on the discriminative potential of entire words. We generate Swiss German dialect words from a Standard German lexicon with the help of hand-crafted phonetic/graphemic rules that are associated with occurrence maps extracted from a linguistic atlas created through extensive empirical fieldwork. In comparison with a charactern-gram approach to dialect identification, our model is more robust to individual spelling differences, which are frequently encountered in non-standardized dialect writing. Moreover, it covers the whole Swiss German dialect continuum, which trained models struggle to achieve due to sparsity of training data.
Yves Scherrer, Owen Rambow
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Yves Scherrer, Owen Rambow
Comments (0)