Introduction

The Reinforcement Learning Problem

  • RL is learning from interaction.
  • There is no supervisor, only signals of reward/evaluative feedback.
  • Rewards are delayed
  • Decisions in sequence does matter as they affect the outcome of subsequent decisions (non i.i.d).

Difference to supervised learning

  • Supervised learning learns from labelled data x ~ y
  • Reinforcement learning learns from data state-action-reward-following_state
    • predict next state
    • predict reward
    • learn behaviour (which action to choose where)

Exploration vs Exploitation

  • Exploration: explores the environment to gather more data
  • Exploitation: exploits the explored information to maximise reward
  • Trade-off between exploration vs exploitation is important