Sciweavers

70 search results - page 14 / 14
» Using self-supervised word segmentation in Chinese informati...
Sort
View
LREC
2010
179views Education» more  LREC 2010»
13 years 7 months ago
A Context Sensitive Variant Dictionary for Supporting Variant Selection
In Japanese, there are a large number of notational variants of words. This is because Japanese words are written in three kinds of characters: kanji (Chinese) characters, hiragar...
Aya Nishikawa, Ryo Nishimura, Yasuhiko Watanabe, Y...
KDD
2006
ACM
179views Data Mining» more  KDD 2006»
14 years 6 months ago
Extracting key-substring-group features for text classification
In many text classification applications, it is appealing to take every document as a string of characters rather than a bag of words. Previous research studies in this area mostl...
Dell Zhang, Wee Sun Lee
KDD
2004
ACM
163views Data Mining» more  KDD 2004»
14 years 6 months ago
Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods
We consider the problem of improving named entity recognition (NER) systems by using external dictionaries--more specifically, the problem of extending state-of-the-art NER system...
William W. Cohen, Sunita Sarawagi
MT
2007
158views more  MT 2007»
13 years 5 months ago
Automatic extraction of translations from web-based bilingual materials
This paper describes the framework of the StatCan Daily Translation Extraction System (SDTES), a computer system that maps and compares webbased translation texts of Statistics Can...
Qibo Zhu, Diana Zaiu Inkpen, Ash Asudeh
ICDAR
1997
IEEE
13 years 10 months ago
Representing OCRed documents in HTML
ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Tao Hong, Sargur N. Srihari