In this paper we present a novel approach, called “Text to Pronunciation (TtP)”, for the proper normalization of Non-Standard Words (NSWs) in unrestricted texts. The methodolog...
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several la...
We describe a model for the lexical analysis of Arabic text, using the lists of alternatives supplied by a broad-coverage morphological analyzer, SAMA, which include stable lemma ...
Rushin Shah, Paramveer S. Dhillon, Mark Liberman, ...