Reinforcement Learning Summary

Inverse Reinforcement Learning

Inverse Reinforcement Learning
Intro
Max Margin Formulation
Reward Formulation

Intro

Develop a policy based on observations of optimal behaviour
Learning from Demonstration

Max Margin Formulation

The margin between π^* and π needs to decrease (logically)

[source: Abbeel]

problem when big number of constraints

Reward Formulation

Get an R that satisfies:

[source: Abbeel]

better for large domains