Reinforcement Learning Summary

Introduction

Introduction
The Reinforcement Learning Problem
Difference to supervised learning
Exploration vs Exploitation

The Reinforcement Learning Problem

RL is learning from interaction.
There is no supervisor, only signals of reward/evaluative feedback.
Rewards are delayed
Decisions in sequence does matter as they affect the outcome of subsequent decisions (non i.i.d).

Difference to supervised learning

Supervised learning learns from labelled data x ~ y
Reinforcement learning learns from data state-action-reward-following_state
- predict next state
- predict reward
- learn behaviour (which action to choose where)

Exploration vs Exploitation

Exploration: explores the environment to gather more data
Exploitation: exploits the explored information to maximise reward
Trade-off between exploration vs exploitation is important