The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and descr...
In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a...
Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong...
The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel ap...
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the ngram...
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used...