Inverse Reinforcement Learning
Intro
- Develop a policy based on observations of optimal behaviour
- Learning from Demonstration
Max Margin Formulation
The margin between π^* and π needs to decrease (logically)
- problem when big number of constraints
Reward Formulation
Get an R that satisfies:
- better for large domains