In Japanese, there are a large number of notational variants of words. This is because Japanese words are written in three kinds of characters: kanji (Chinese) characters, hiragar...
Aya Nishikawa, Ryo Nishimura, Yasuhiko Watanabe, Y...
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Can...
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...