Inverse Reinforcement Learning
Intro
- Develop a policy based on observations of optimal behaviour
- Learning from Demonstration
Max Margin Formulation
The margin between π^* and π needs to decrease (logically)

- problem when big number of constraints
Reward Formulation
Get an R that satisfies:

- better for large domains