Sciweavers

COLING
2010

Unsupervised cleansing of noisy text

12 years 11 months ago
Unsupervised cleansing of noisy text
In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message Service (SMS), Twitter and chat. A typical Statistical Machine Translation system is trained on parallel text comprising noisy and clean sentences. In this paper we propose an unsupervised method for the translation of noisy text to clean text. Our method has two steps. For a given noisy sentence, a weighted list of possible clean tokens for each noisy token are obtained. The clean sentence is then obtained by maximizing the product of the weighted lists and the language model scores.
Danish Contractor, Tanveer A. Faruquie, L. Venkata
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Danish Contractor, Tanveer A. Faruquie, L. Venkata Subramaniam
Comments (0)