Sciweavers

735 search results - page 17 / 147
» Corpora and data preparation
Sort
View
TSD
2001
Springer
15 years 2 months ago
Finding Semantically Related Words in Large Corpora
The paper deals with the linguistic problem of fully automatic grouping of semantically related words. We discuss the measures of semantic relatedness of basic word forms and descr...
Pavel Smrz, Pavel Rychlý
ACL
2009
14 years 7 months ago
Extracting Paraphrases of Technical Terms from Noisy Parallel Software Corpora
In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a...
Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong...
ACL
2010
14 years 8 months ago
How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
The growing availability of spoken language corpora presents new opportunities for enriching the methodologies of speech and language therapy. In this paper, we present a novel ap...
Daniil Umanski, Federico Sangati
EMNLP
2008
14 years 11 months ago
N-gram Weighting: Reducing Training Data Mismatch in Cross-Domain Language Model Estimation
In domains with insufficient matched training data, language models are often constructed by interpolating component models trained from partially matched corpora. Since the ngram...
Bo-June Paul Hsu, James R. Glass
EACL
2003
ACL Anthology
14 years 11 months ago
Experiments on Candidate Data for Collocation Extraction
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used...
Stefan Evert, Hannah Kermes