In this paper we show that exponentially deep belief networks [3, 7, 4] can approximate any distribution over binary vectors to arbitrary accuracy, even when the width of each layer is limited to the dimensionality of the data. This resolves an open the problem in [6]. We further show that such networks can be greedily learned in an easy yet impractical way.
Ilya Sutskever, Geoffrey E. Hinton