Nguyen-Tuong, D., & Peters, J. (2011). Model Learning for Robot Control: A Survey. Cognitive Processing, 12, 319–340. doi:10.1007/s10339-011-0404-1

As this is a review paper, I will just give a summary and takeaways, and not include methods/algorithm.

Summary

This was a really nice survey paper of how dynamics models have been used in robotics (and in particular in control). First, Nguyen-Tuong & Peters distinguish between different model types:

Forward models – predict $s_{t+1}$ from $s_t$ and $a_t$
Inverse models – infer $a_t$ from $s_t$ and $s_{t+1}$
Mixed models – a combination of forward models and inverse models, which helps to constrain inverse models which are frequently ill-defined (i.e. there is no unique solution)
Multi-step prediction models – predict the state for the next $k$ timesteps. This is more than just applying the forward model $k$ times, as this means errors accumulate. Instead there are “autoregressive” methods that use past predicted values to predict future values.

Second, Nguyen-Tuong & Peters discuss different architectures for learning these different types of models:

Direct modeling – learn a direct mapping between inputs and outputs. This can be used to learn all the types of models described above, though it can only be used to model inverse and mixed models if they are invertible.
Indirect modeling – use error from a feedback controller to optimize the model. This can be used for learning inverse and mixed model types.
Distal teacher – first learn a forward model, and then use the learned forward model to generate errors for use in learning an inverse model. This is mainly used for modeling mixed models.

Third, Nguyen-Tuong & Peters discuss challenges for model-based learning, which are (copied directly from Table 2, pg. 13):

Data challenges: high dimensionality, smoothness, richness of data, noise, outliers, redundant data, missing data
Algorithmic constraints: incremental updates, real-time, online learning, efficiency, large data sets, prior knowledge, sparse data
Real-world challenges: safety, robustness, generalization, interaction, stability, uncertainty in the environment

Fourth, Nguyen-Tuong & Peters given an overview of how some of the above challenges can be addressed by machine learning techniques. They break the approaches down into two main categories: global learning techniques (which learn the entire global function) and local techniques (which approximate local functions around a fixed point). They give a lot of examples but don’t go much into the details, so I will need to take a further look at some of their referenced papers.

Finally, the authors present three case studies as examples of how predictive model-based control has been successfully applied. Their first case study, on “simulation-based optimization”, refers to using forward models for controlling a helicopter. In particular, they reference one particular paper (Ng et al.) in which they use explicit rollouts of the simulator in order to learn the appropriate control policy. Their second case study involves “approximation-based inverse dynamics control”, in which inverse dynamics must be learned (rather than pre-programmed) because for most robotic platforms, rigid-body physics is not applicable. Their third case study focuses on “learning operational space control”, which is the idea of learning a control policy in task space, rather than in e.g. position space.

Takeaways

There are a few interesting things that I noted while reading this article. First, the traditional notion of discriminative/generative models kind of breaks down in this discussion. Nguyen-Tuong & Peters are talking in general about learning forward (i.e. generative models), yet they discuss ways to learn these models which resemble discriminative (i.e., direct learning) or generative (i.e., distal teacher learning) approaches. So, I suppose if you are learning a hierarchical model, you could have different parts be generative or discriminative—the whole thing doesn’t have to be one way or the other.

Nguyen-Tuong & Peters also discuss how the combination of forward and inverse models (i.e. mixed models) is useful because “information encoded in the forward model can help to resolve the non-uniqueness, i.e., the ill-posedness, of the inverse model” (pg. 7). This is not something I’d thought about before in that particular way—I’ve thought for a long time that it’s useful to have a generative model and then learn a discriminative model from that, but I hadn’t thought about the possibility that the generative model could constrain the discriminative model when the function it’s trying to learn isn’t one-to-one.

I’m interested in reading more about the distal-teacher learning technique, which sounds to me like something that could potentially be used when learning from mental simulations. Though, Nguyen-Tuong & Peters say that “distal teacher learning has several potential disadvantages, such as learning stability, numerical stability, and error accumulation” (pg. 13). They don’t cite sources for that, unfortunately. But in general I would like to understand better the strengths and weaknesses of that approach.