Sciweavers

70 search results - page 14 / 14
» Using self-supervised word segmentation in Chinese informati...
Sort
View
LREC
2010
179views Education» more  LREC 2010»
13 years 7 months ago
A Context Sensitive Variant Dictionary for Supporting Variant Selection
In Japanese, there are a large number of notational variants of words. This is because Japanese words are written in three kinds of characters: kanji (Chinese) characters, hiragar...
Aya Nishikawa, Ryo Nishimura, Yasuhiko Watanabe, Y...
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
14 years 5 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee
KDD
2004
ACM
163views Data Mining» more  KDD 2004»
14 years 5 months ago
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
William W. Cohen, Sunita Sarawagi
MT
2007
158views more  MT 2007»
13 years 5 months ago
Automatic extraction of translations from web-based bilingual materials
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Can...
Qibo Zhu, Diana Zaiu Inkpen, Ash Asudeh
ICDAR
1997
IEEE
13 years 9 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari