Sciweavers

79 search results - page 13 / 16
» Self-Supervised Chinese Word Segmentation
Sort
View
TALIP
2002
108views more  TALIP 2002»
14 years 11 months ago
Toward a unified approach to statistical language modeling for Chinese
This paper presents a unified approach to Chinese statistical language modeling (SLM). Applying SLM techniques like trigram language models to Chinese is challenging because (1) t...
Jianfeng Gao, Joshua Goodman, Mingjing Li, Kai-Fu ...
102
Voted
ACL
2008
15 years 1 months ago
Text Segmentation with LDA-Based Fisher Kernel
In this paper we propose a domainindependent text segmentation method, which consists of three components. Latent Dirichlet allocation (LDA) is employed to compute words semantic ...
Qi Sun, Runxin Li, Dingsheng Luo, Xihong Wu
COLING
1992
15 years 25 days ago
Tokenization As The Initial Phase In NLP
In this paper, the authors address the significance and complexityof tokenization, the beginning step of NLP. Notions of word and token are discussed and defined from the viewpoin...
Jonathan J. Webster, Chunyu Kit
IUCS
2009
ACM
194views Communications» more  IUCS 2009»
15 years 6 months ago
Automatic extraction of bilingual terms from a Chinese-Japanese parallel corpus
This paper proposes a new approach for the automatic extraction of bilingual terms from a domain-specific bilingual parallel corpus. We combine existing monolingual term extractor...
Xiaorong Fan, Nobuyuki Shimizu, Hiroshi Nakagawa
82
Voted
TASLP
2002
124views more  TASLP 2002»
14 years 11 months ago
Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech informatio
With the rapidly growing use of the audio and multimedia information over the Internet, the technology for retrieving speech information using voice queries is becoming more and mo...
Berlin Chen, Hsin-Min Wang, Lin-Shan Lee