Statistical Machine Translation (SMT) is based on alignment models which learn from bilingual corpora the word correspondences between source and target language. These models are...
In this paper we describe some studies of Portuguese-English word alignment, focusing on (i) measuring the importance of the coupling between dictionaries and corpus; (ii) assessi...
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...
Multilingual parallel text corpora provide a powerful means for propagating linguistic knowledge across languages. We present a model which jointly learns linguistic structure for...
We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a tools for manipulating these annotations. The ...
Steven Bird, David Day, John S. Garofolo, John Hen...