Model Based Reinforcement Learning
Intro
- pros:
- Can efficiently learn model by supervised learning methods
- Rapid adaptation to new problems and situations (via planning)
- Can reason about model uncertainty
- cons:
- Disadvantages: two sources of approximation error!
- In estimating model and value function
- If model is inaccurate ā planning will compute a suboptimal policy
- Hence, asymptotically model-free methods are often better
- Solution 1: reason explicitly about model uncertainty (BRL)
- Solution 2: use model-free RL when model is wrong
- Solution 3: integrated model-based and model-free
- Learning allows an agent to improve its policy from its interactions with the environment.
- Planning: to improve its policy w/o further interaction.
FlatMC
MCTS
TD-search
- While MCTS applies MC control to sub-MDP from now,
- TD search applied SARSA.
-
For each sim step: Update the Q values using (s, a, r, sā, aā)
- model-free
- bootstraping
Explore / Exploit Strategies
- RL agents start to act without a model of the environment: have to learn from its experience what to do in order to fulfil tasks and achieve high average return.
- Online decision-making involves a fundamental choice:
- Exploitation: Make the best decision given current information
- Exploration: Gather more information
- The best long-term strategy may involve short-term sacrifices
- Gather enough information to make the best overall decisions ā exploration as fundamental intelligent behaviour