Model Based Reinforcement Learning

Intro

  • pros:
    • Can efficiently learn model by supervised learning methods
    • Rapid adaptation to new problems and situations (via planning)
    • Can reason about model uncertainty
  • cons:
    • Disadvantages: two sources of approximation error!
    • In estimating model and value function
    • If model is inaccurate ā‡’ planning will compute a suboptimal policy
    • Hence, asymptotically model-free methods are often better
    • Solution 1: reason explicitly about model uncertainty (BRL)
    • Solution 2: use model-free RL when model is wrong
    • Solution 3: integrated model-based and model-free
  • Learning allows an agent to improve its policy from its interactions with the environment.
  • Planning: to improve its policy w/o further interaction.

FlatMC

MCTS

  • While MCTS applies MC control to sub-MDP from now,
  • TD search applied SARSA.
  • For each sim step: Update the Q values using (s, a, r, sā€™, aā€™)

  • model-free
  • bootstraping

Explore / Exploit Strategies

  • RL agents start to act without a model of the environment: have to learn from its experience what to do in order to fulfil tasks and achieve high average return.
  • Online decision-making involves a fundamental choice:
    • Exploitation: Make the best decision given current information
    • Exploration: Gather more information
  • The best long-term strategy may involve short-term sacrifices
  • Gather enough information to make the best overall decisions ā‡’ exploration as fundamental intelligent behaviour