Sciweavers

EMNLP
2008

A Discriminative Candidate Generator for String Transformations

13 years 5 months ago
A Discriminative Candidate Generator for String Transformations
String transformation, which maps a source string s into its desirable form t , is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.
Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiad
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where EMNLP
Authors Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiadou, Jun-ichi Tsujii
Comments (0)