Increasingly large text datasets and the high dimensionality associated with natural language create a great challenge in text mining. In this research, a systematic study is cond...
M. Mahdi Shafiei, Singer Wang, Roger Zhang, Evange...
This paper presents Domain Relevance Estimation (DRE), a fully unsupervised text categorization technique based on the statistical estimation of the relevance of a text with respe...
Recently, an increasing attention has been paid to Mandarin word stress which is important for improving the naturalness of speech synthesis. Most of the research on Mandarin spee...
Ya Li, Jianhua Tao, Meng Zhang, Shifeng Pan, Xiaoy...
Abstract. The paper proposes a method to improve the extraction of lowfrequency translation equivalents from comparable corpora. Prior to performing the mapping between vector spac...
Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Andr...
We demonstrate a new research approach to the problem of predicting the reading difficulty of a text passage, by recasting readability in terms of statistical language modeling. W...