There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknown words. Conve...
In this paper, the authors address the significance and complexityof tokenization, the beginning step of NLP. Notions of word and token are discussed and defined from the viewpoin...
In OCR systems the character segmentation algorithm may generate mis-segmented blocks. Feedback information from character classifier is indispensable to achieve higher character ...
A new algorithm, based on area-deviation, is proposed for the detection of corner points of digitized curves. The algorithm consists of two steps. In the first step, a fixed-lengt...
XML Topic maps enable multiple, concurrent views of sets of information objects and can be used to different applications. For example, thesaurus-like interfaces to corpora, navig...