Subword Variation in Text Message Classification

8 years 7 months ago
Subword Variation in Text Message Classification
For millions of people in less resourced regions of the world, text messages (SMS) provide the only regular contact with their doctor. Classifying messages by medical labels supports rapid responses to emergencies, the early identification of epidemics and everyday administration, but challenges include textbrevity, rich morphology, phonological variation, and limited training data. We present a novel system that addresses these, working with a clinic in rural Malawi and texts in the Chichewa language. We show that modeling morphological and phonological variation leads to a substantial average gain of F=0.206 and an error reduction of up to 63.8% for specific labels, relative to a baseline system optimized over word-sequences. By comparison, there is no significant gain when applying the same system to the English translations of the same texts/labels, emphasizing the need for subword modeling in many languages. Language independent morphological models perform as accurately as langu...
Robert Munro, Christopher D. Manning
Added 14 Feb 2011
Updated 14 Feb 2011
Type Journal
Year 2010
Authors Robert Munro, Christopher D. Manning
Comments (0)