Dayan, P., Hinton, G. E., Neal, R. M., & Zemel, R. S. (1995). The Helmholtz machine. Neural Computation, 7(5), 889–904. doi:10.1162/neco.1995.7.5.889

Summary

In this paper, Dayan et al. propose a method for learning about the underlying structure in data using self-supervised learning in a neural network. Specifically, they construct the network to have bottom-up recognition weights and top-down generative weights. The network is then trained according to a wake-sleep algorithm, where the generative weights are trained during the “wake” phase and the recognition weights are trained during the “sleep” phase by simulating training examples from the generative model.

Methods

n/a

Algorithm

The recognition probability of unit $j$ in layer $\ell$ is:

$q_j^\ell(\phi,\mathbf{s}^{\ell-1})=\sigma\left(\sum_i s_i^{\ell-1}\phi_{i,j}^{\ell-1,\ell}\right)$

where $\sigma$ is the sigmoid function and $\phi$ are the recognition weights. As mentioned earlier, the recognition weights are trained during the sleep phase by simulating training data from the generative model.

The generative probability of unit $j$ in layer $\ell$ is:

$p_j^\ell(\theta,\mathbf{s}^{\ell+1})=\sigma\left(\sum_i s_i^{\ell+1}\theta_{i,j}^{\ell+1,\ell}\right)$

where $s$ are the unit activities. The generative model is trained by presenting data to input units and then activating units according to $q_j^\ell$. Then, the generative weights are updated to minimized the KL-divergence between the actual activations and the generative probabilities $p_j^\ell$.

The wake and sleep phases are computed iteratively and over time the weights should converge such that $Q(d)=P(\theta,d)$.

Takeaways

The Helmholtz machine is a very cool idea, in that it is an unsupervised way of (approximately) learning highly complex structures by jointly training a recognition and generative model. This can be thought of as a way of implementing the EM algorithm in a neural network; it can also be interpreted as a particular type of autoencoder (e.g. if it is a one-layer Helmholtz machine, it is a folded-over autoencoder).