Weiss, Y., Simoncelli, E. P., & Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5(6), 598–604. doi:10.1038/nn858

Summary

In visual perception research, there is the finding that sometimes a motion percept (such as a rhombus) appears to be moving horizontally, while other times it appears to be moving diagonally. Specifically, thin rhombuses with low contrast look as if they have diagonal motion (even though it is truly horizontal) and with high contrast they look like they have horizontal motion. For thick rhombuses, it always appears horizontal.

The explanation for these effects has been a combination of “intersection of constraints” (IOC), in which the claim is that people pay attention to e.g. the corners of the shape, or “vector average” (VA), in which people compute the vector normal of each dimension and average them. IOC predicts horizontal motion and VA predicts vertical motion.

Rather than applying these theories ad-hoc, Weiss et al. devise an ideal observer model whose behavior is consistent with these findings. They assume two key constraints:

local motion measurements are ambiguous
slow motions are more likely than fast ones

In addition to the rhombus results, this model qualitatively captures the results from a number of other related studies.

Methods

Stimuli: http://www.cs.huji.ac.il/~yweiss/Rhombus/rhombus.html

Algorithm

The assumption is that points in the world move but do not change their intensity over time, but that the observation of this constraint is noisy, i.e.:

$I(x,y,t)=I(x+v_x\delta t, y+v_y\delta t, t+\delta t) + \eta$

where $\eta\sim \mathcal{N}(0,\sigma)$. Computing the first-order Taylor series expansion:

$P(I(x_i,y_i,t)\vert v_i)\propto \exp\left(-\frac{1}{2\sigma^2}\int_{x,y} w_i(x,y)(\frac{\partial I}{\partial x}(x,y,t)v_x+\frac{\partial I}{\partial y}(x,y,t)v_t+\frac{\partial I}{\partial t}(x,y,t))^2\ \mathrm{d}x\ \mathrm{d}y\right)$

where $w_i(x,y)$ is a window centered on $(x_i,y_i)$. In practice they say they used “a small Gaussian window”, though they do not explicitly define what size that is.

They chose a prior to favor slow speeds:

$P(v)\propto \exp(-\lVert v\rVert ^2/2\sigma_p^2)$

And so the posterior is then:

$P(v\vert I)\propto P(v)\prod_{i:v_i=v} P(I(x_i,y_i,t)\vert v)$

where the product is computed over all locations $i$ that are moving with a common velocity $v$ (in practice this is done over the entire image, which is moving with the same velocity vector).

Takeaways

This has a nice tie-in with the approach of rational analyis. Rather than assuming that the visual system is performing some specific computation (e.g. IOC or VA), assume that the visual system is trying to solve a problem given certain constraints: what is the velocity vector of the image given noisy percepts and particular scene statistics?

They describe the effect of modulating the noise in the likelihood, but it would have also been interesting to see what the effect of changing the prior is. What if they assumed a uniform distribution of speeds? Or what is the prior were centered on the true velocity instead? Would you find that this model collapses into pure IOC behavior or pure VA behavior?

From playing around with this, it seems as though a uniform prior collapses to IOC. Having a prior centered on VA obviously biases towards VA, though the strength of that biases depends strongly on the variance. In other words, if the prior is Gaussian, even if it’s not zero-mean, you still get largely the same behavior as what they report in the paper.