Paraschos, A., Rueckert, E., Peters, J., & Neumann, G. (2015). Model-Free Probabilistic Movement Primitives for Physical Interaction. Proceedings Of the IEEE/RSJ Conference on Intelligent Robots and Systems. Retrieved from http://www.ausy.tu-darmstadt.de/uploads/Team/PubAlexParaschos/Paraschos_IROS_2015.pdf

Summary

In this work (and in previous work), Paraschos et al. present a framework for learning “probabilistic motion primitives” (ProMPs), which is essentially a way of learning a distribution over trajectories. In this paper, they explicitly encode the trajectory distributions to not just be over states, but also over controls (forces/torques) and sensory information. That is, they define a trajectory as:

$\begin{align*} \mathbf{y}_t&=[q_t,\dot{q}_t,u_t]^\top\\ \mathbf{\tau}&=[\mathbf{y}_1, \ldots{}, \mathbf{y}_T] \end{align*}$

They then define the trajectory in terms of basis functions:

$p(\mathbf{\tau}\vert w)=\prod_t \mathcal{N}(\mathbf{y}_t\vert \mathbf{\Phi}_t \mathbf{w},\mathbf{\Sigma_y})$

and learn the distribution over trajectories in terms of a hierarchical Bayesian model, marginalizing out the weights:

$p(\mathbf{\tau};\mathbf{\theta})=\int p(\mathbf{\tau}\vert \mathbf{w})p(\mathbf{w};\mathbf{\theta})\ \mathrm{d}\mathbf{w}$

Paraschos et al. assume that pretty much everything is Gaussian, which allows them to derive nice analytical solutions for everything, including the controller $p(\mathbf{u}_t\vert \tilde{\mathbf{y}}_t)$ (where $\tilde{\mathbf{y}}_t$ is the observable state of the system).

As described in their previous paper (NIPS ‘13), this type of system provides a lot of nice properties: it is easy to condition on a particular aspect of a trajectory; to combine multiple motion primitives by multiplying the distributions; to blend multiple primitives according to an activation function; etc. By also including the control information, they are not only able to produce the desired trajectory in these scenarios, but the controls to achieve that trajectory as well. This is a somewhat similar approach to Lee et al..

Takeaways

This is a cool paper that really demonstrates the power of maintaining a full generative model. It brings up an interesting distinction between model-based control and generative models—in this paper, Paraschos et al. explicitly are not using a model-based approach for controlling the robot (i.e., there is no dynamics model of the robot or the object), but they do estimate a full joint distribution over pose, velocity, force, and sensor data. With enough data, such a joint distribution could be essentially analogous to learning a model for control; in that case, the model is just not explicitly represented. You could even simulate forward by conditioning on the previous state.

As with all of these papers that I’ve been reading, though, it seems like these approaches that rely solely on demonstrations won’t be able to generalize to novel objects or actions. I suppose that people do have tons of experience; after only a few weeks after birth babies will have already been exposed to thousands of trajectories (of course, those are not expert demonstrations). I wonder how well this sort of approach would work with some type of self-supervised learning: i.e., you learn the joint distribution over states and actions, but the examples you have are self-generated and are not as reliable as expert demonstrations. I also wonder if you could combine this with something like Xie et al., where instead of learning model-free controls, you do learn an explicit model, but you learn the model online along with the rest of the distribution.