Toward text message normalization: Modeling abbreviation generation

8 years 11 months ago
Toward text message normalization: Modeling abbreviation generation
This paper describes a text normalization system for deletion-based abbreviations in informal text. We propose using statistical classifiers to learn the probability of deleting a given character using features based on character context, position in the word and containing syllable, and function within the word. To ensure that our system is robust to different and previously unseen abbreviations for a word, we generate multiple abbreviation hypotheses for a word using the predictions from the classifiers. We then reverse the mappings to enable recovery of English words from the abbreviations. Different knowledge sources are used to disambiguate word candidates: abbreviation likelihood, length, and language model scores. Our results show that this approach is feasible and warrants further exploration in the future.
Deana Pennell, Yang Liu
Added 21 Aug 2011
Updated 21 Aug 2011
Type Journal
Year 2011
Authors Deana Pennell, Yang Liu
Comments (0)