This paper introduces new methods based on exponential families for modeling the correlations between words in text and speech. While previous work assumed the effects of word co-...
Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework i...
Dative variation is a widely observed syntactic phenomenon in world languages (e.g. I gave John a book and I gave a book to John). It has been shown that which surface form will b...
Many statistical translation models can be regarded as weighted logical deduction. Under this paradigm, we use weights from the expectation semiring (Eisner, 2002), to compute fir...
Unsupervised discovery of latent representations, in addition to being useful for density modeling, visualisation and exploratory data analysis, is also increasingly important for...
Jasper Snoek, Ryan Prescott Adams, Hugo Larochelle