Inverse Reinforcement Learning

Intro

  • Develop a policy based on observations of optimal behaviour
  • Learning from Demonstration

Max Margin Formulation

The margin between π^* and π needs to decrease (logically)

[source: Abbeel]

  • problem when big number of constraints

Reward Formulation

Get an R that satisfies:

[source: Abbeel]

  • better for large domains