Sciweavers

COLING
1992

Tokenization As The Initial Phase In NLP

13 years 5 months ago
Tokenization As The Initial Phase In NLP
In this paper, the authors address the significance and complexityof tokenization, the beginning step of NLP. Notions of word and token are discussed and defined from the viewpoints of lexicography and pragmatic implementation, respectively. Automatic segmentation of Chinese words is presented as an illustration of tokenization. Practical approaches to identification of compound tokens in English, such as idioms, phrasal verbs and fixed expressions, are developed.
Jonathan J. Webster, Chunyu Kit
Added 07 Nov 2010
Updated 07 Nov 2010
Type Conference
Year 1992
Where COLING
Authors Jonathan J. Webster, Chunyu Kit
Comments (0)