Learning compound multi-step controllers under unknown dynamics
Han, W., Levine, S., & Abbeel, P. (2015). Learning Compound Multi-Step Controllers under Unknown Dynamics. Proceedings Of the 28th IEEE/RSJ International Conference on Intelligent Robots and Systems. Retrieved from http://rll.berkeley.edu/reset_controller/reset_controller.pdf
Summary
In this paper, Han et al. build on the work of Levine et al. to learn controllers for tasks involving multiple steps. This is a difficult problem for traditional approaches to control, which assume that tasks are stationary (i.e., that the initial conditions are the same), which isn’t usually the case for compound tasks because the initial state of each step depends on the end state of the previous step. To address this, Han et al. formulate a way to learn both forward and reverse controllers simultaneously. Then, in the compound task, if a forward controller fails, the reverse controller can be used to reset the movement and try again. Training compound controllers in this way is much more effective than trying to learn one global controller.
Methods
n/a
Algorithm
Training the forward and reset controllers is actually pretty straightfoward. Han et al. run the forward controller and reset controllers in sequence $N$ times, and use these samples to update both controllers according to the task-specific loss function (for the forward controller) or a reset cost (for the reset controller).
Training the compound controller follows largely the same idea. Each forward controller is executed in sequence. If a controller fails, then the corresponding reverse controller is used to undo the action, and the forward controller is run again. This is repeated until the controller succeeds. After $N$ samples are collected for a controller, it is refit according to those samples. These $N$ samples count as one “iteration”, and all controllers are trained for $K$ iterations.
As in the previous work, the individual linear-Gaussian controllers can also be used to train a more general parameterized policy (such as a neural network). The benefit of training both forward and reset controllers, though, means that the system can autonomously train the neural network by running the forward controllers, using those samples as training data for the network, and resetting automatically using the reset controller.
Takeaways
This way of training compound controllers is exciting, because it could potentially be integrated with higher-level planning algorithms to determine how to accomplish more sophisticated and complex tasks. This brings up an interesting distinction in the idea of simulation—that you can potentially have simulations at different levels of abstraction, as well as at different levels of granularity. In this case, the robot can learn a dynamics model for multiple motion primitives (the individual linear-Gaussian controllers) but it might also need to be able to learn a higher-level (perhaps more qualitative) form of simulation in order to reason about how to accomplish the task in the first place. In the case of screwing in a bolt, the robot might need high-level qualitative knowledge about how the task works (first need to pick up the wrench, then bring it to the bolt, then turn it because bolts need to be turned to go further into the hole, then repeat, etc.), but as shown by this work, it also needs low-level knowledge about the dynamics of the task in order to actually execute the subparts of the action.