reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Inverse Reinforcement Learning in Relational Domains

Authors: Thibaut Munzer, Bilal Piot, Matthieu Geist, Olivier Pietquin, Manuel Lopes

IJCAI 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To validate the proposed approach, experiments have been run to (i) conﬁrm RCSI can learn a relational reward from demonstrations, (ii) study the inﬂuence of the different parameters and (iii) show that IRL outperforms classiﬁcation based imitation learning when dealing with transfer and changes in dynamics.
Researcher Affiliation	Academia	Thibaut Munzer Inria, Bordeaux, France EMAIL Bilal Piot University Lille 1, Lille, France Matthieu Geist Supelec, Metz, France Olivier Pietquin University Lille 1, Lille, France Manuel Lopes Inria, Bordeaux, France EMAIL
Pseudocode	No	Figure 1: Sketch of the proposed method : CSI with reward shaping.
Open Source Code	No	No statement about open-source code for their method was found.
Open Datasets	No	From a target reward R , we compute an optimal policy π . The algorithm is given, as expert demonstrations, Nexpert trajectories starting from a random state and ending when the (ﬁrst) wait action is selected. As random demonstrations, the algorithm is given Nrandom one-step trajectories starting from random states.
Dataset Splits	No	The setting Nrandom = 300 and Nexpert = 15 gives good results and so we will use it in the following experiments.
Hardware Specification	No	No specific hardware details were provided.
Software Dependencies	No	TBRIL... TILDE [Blockeel and De Raedt, 1998] is an algorithm designed to do classiﬁcation and regression over relational data. It is a decision tree learner similar to C4.5 [Quinlan, 1993].
Experiment Setup	Yes	The main parameters are set as follows: 10 trees of maximum depth 4 are learned by TBRIL during the SBC step 1 and the reward is learned with a tree of depth 4, which acts as a regularization parameter.