Schulman, J., Ho, J., Lee, C., & Abbeel, P. (2013). Learning from Demonstrations Through the Use of Non-Rigid Registration. Proceedings Of the 16th International Symposium on Robotics Research. Retrieved from http://www.cs.berkeley.edu/ pabbeel/papers/SchulmanHoLeeAbbeel_ISRR2013.pdf

Summary

Schulman et al. describe a way to learn from demonstration for performing tasks with deformable objects (e.g., rope tying). The core idea behind their method is essentially to compute a geometric transformation $\mathbf{f}$ which maps from one state to another state, and which can be applied to the demonstrated trajectory to produce a new trajectory. This approach, combined with a nearest-neighbors selection policy, works well for transforming given trajectories to new trajectories as long as the nearest neighbor is close enough.

Their approach rests on the following property (the cost function invariance):

$L(\mathrm{State},\mathrm{Traj})=L(\mathbf{f}(\mathrm{State}),\mathbf{f}(\mathrm{Traj}))$

where $L$ is the loss function. Using this property, the following approximation follows:

$\begin{align*} L(\mathrm{State}_\mathrm{test},\mathbf{f}(\mathrm{Traj}_\mathrm{train}))&\approx L(\mathbf{f}(\mathrm{State}_\mathrm{train}),\mathbf{f}(\mathrm{Traj}_\mathrm{train}))\\ &= L(\mathrm{State}_\mathrm{train},\mathrm{Traj}_\mathrm{train}) \end{align*}$

An alternate intuition for why this works is that of dynamics invariance, in which the “dynamics of the system are approximately invariant—more properly, covariant—under sufficiently smooth coordinate transformations”. This is true if and only if:

$\mathbf{f}(\Pi_\mathrm{Traj}(\mathrm{State}^t))=\Pi_{\mathbf{f}(\mathrm{Traj})}(f(\mathrm{State}^t))$

where $\Pi_\mathrm{Traj}(\mathrm{State})$ applies trajectory $\mathrm{Traj}$ to state $\mathrm{State}$.

Of course, there are cases where this will not hold. The example they give in the paper is a rigid transformation where the “absolution orientation of the robot’s end-effector matters”. However, based on their results, it seems that for a wide range of states and trajectories it does hold well enough to generalize to new states.

Takeaways

This is a clever use of nearest neighbors to generalize from examples without needing to have any knowledge of the domain (besides being able to perceive the position/topology of the relevant object, e.g. the rope). However, I wonder how far this will really generalize, even with many examples. For example, if you had to tie a rope around another object, I don’t think this would work as well—and this intuition seems to be correct, based on their experimental results (1/10 successes for the clove hitch, which involves tying a rope around a pole). I would expect it to be even harder if the task involved tying a rope around a nonrigid object. There is simply so much noise in the system that without some model of how the objects behave, you are going to quickly run into scenarios that you don’t have examples for.

I also think you would probably quickly run into scenarios where the transformation between states isn’t smooth. For example, if the task were to place a block on a stack of blocks and have the tower remain stable, the stability of the tower doesn’t vary smoothly as a function of the positions of the blocks. So even if you had encountered a very similar tower in the past, it’s entirely possible that even a slight difference between that tower and the current one could result in nonlinearities—e.g. in the first tower you need to place the block in position A to keep the tower stable, but position A in the second tower will always be unstable.