Guez, A., Silver, D., & Dayan, P. (2013). Scalable and efficient Bayes-adaptive reinforcement learning based on Monte-Carlo tree search. Journal Of Artificial Intelligence Research, 48, 841–883. doi:10.1613/jair.4117

Summary

In this paper, Guez et al. lay out an approach for performing Bayes-adaptive reinforcement learning (e.g. Dearden et al., 1999) using Monte-Carlo tree search (e.g. Browne et al., 2012). They introduce an algorithm called BAMCP (Bayes-Adaptive Monte-Carlo Planner) which works by sampling a MDP from the agent’s beliefs over the transition/reward functions, and then performs one simulation according to Monte-Carlo Tree Search. This results in essentially doing an approximate Monte-Carlo integration over all rewards, over all possible MDPs.

Methods

n/a

Algorithm

I implemented a demo of the BAMCP algorithm, rather than writing it here.

Takeaways

Bayes-adaptive reinforcement learning is powerful and offers us the flexibility of incorporating structured priors into our planners, and planning properly under uncertain transitions/rewards. Although in the general case this may be intractable, MCTS offers a way to perform a Monte-Carlo approximation to the true posterior in a relatively efficient manner.