Inverse Reinforcement Learning in Relational Domains
Authors: Thibaut Munzer, Bilal Piot, Matthieu Geist, Olivier Pietquin, Manuel Lopes
IJCAI 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the proposed approach, experiments have been run to (i) confirm RCSI can learn a relational reward from demonstrations, (ii) study the influence of the different parameters and (iii) show that IRL outperforms classification based imitation learning when dealing with transfer and changes in dynamics. |
| Researcher Affiliation | Academia | Thibaut Munzer Inria, Bordeaux, France EMAIL Bilal Piot University Lille 1, Lille, France Matthieu Geist Supelec, Metz, France Olivier Pietquin University Lille 1, Lille, France Manuel Lopes Inria, Bordeaux, France EMAIL |
| Pseudocode | No | Figure 1: Sketch of the proposed method : CSI with reward shaping. |
| Open Source Code | No | No statement about open-source code for their method was found. |
| Open Datasets | No | From a target reward R , we compute an optimal policy π . The algorithm is given, as expert demonstrations, Nexpert trajectories starting from a random state and ending when the (first) wait action is selected. As random demonstrations, the algorithm is given Nrandom one-step trajectories starting from random states. |
| Dataset Splits | No | The setting Nrandom = 300 and Nexpert = 15 gives good results and so we will use it in the following experiments. |
| Hardware Specification | No | No specific hardware details were provided. |
| Software Dependencies | No | TBRIL... TILDE [Blockeel and De Raedt, 1998] is an algorithm designed to do classification and regression over relational data. It is a decision tree learner similar to C4.5 [Quinlan, 1993]. |
| Experiment Setup | Yes | The main parameters are set as follows: 10 trees of maximum depth 4 are learned by TBRIL during the SBC step 1 and the reward is learned with a tree of depth 4, which acts as a regularization parameter. |