Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
The map-reduce model requires users to express their problem in terms of a map function that processes single records in a stream, and a reduce function that merges all mapped out...
Jackson H. C. Yeung, C. C. Tsang, Kuen Hung Tsoi, ...
In this paper, we propose the use of the Maximum Entropy approach for the task of automatic image annotation. Given labeled training data, Maximum Entropy is a statistical techniqu...
Abstract--Until quite recently, extending Phrase-based Statistical Machine Translation (PBSMT) with syntactic knowledge caused system performance to deteriorate. The most recent su...
Topic models are a useful tool for analyzing large text collections, but have previously been applied in only monolingual, or at most bilingual, contexts. Meanwhile, massive colle...
David M. Mimno, Hanna M. Wallach, Jason Naradowsky...