reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

KD-BIRL: Kernel Density Bayesian Inverse Reinforcement Learning

Authors: Aishwarya Mandyam, Didong Li, Andrew Jones, Diana Cai, Barbara E Engelhardt

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results highlight KDBIRL s faster concentration rate in comparison to baselines, particularly in low test task expert demonstration data regimes. Additionally, we are the ﬁrst to provide theoretical guarantees of posterior concentration for a Bayesian IRL algorithm. Taken together, this work introduces a principled and theoretically grounded framework that enables Bayesian IRL to be applied across a variety of domains.
Researcher Affiliation	Academia	Aishwarya Mandyam EMAIL Department of Computer Science Stanford University Didong Li EMAIL Department of Biostatistics University of North Carolina Diana Cai dcai@ﬂatironinstitute.org Flatiron Institute Andrew Jones EMAIL Department of Computer Science Princeton University Barbara E. Engelhardt EMAIL Gladstone Institutes Department of Biomedical Data Science Stanford University
Pseudocode	Yes	We use a Hamiltonian Monte Carlo algorithm (Team, 2011) (details in Appendix F, and Algorithm 1) which is suited to large parameter spaces.
Open Source Code	No	No explicit statement about open-source code for the described methodology or a repository link was found in the paper.
Open Datasets	No	The first is a Gridworld setting with a discrete state space. We use three grid sizes (2 2, 5 5 and 10 10) to investigate how KD-BIRL s performance scales. The second setting is a simulated sepsis treatment environment (Amirhossein Kiani, 2019), which has a continuous state space and is thus, more challenging.
Dataset Splits	No	We assume that we have several training tasks and a single test task. For each training task, we have access to both optimal demonstrations from the corresponding expert RL agent, and know the reward function the expert is optimizing for. Speciﬁcally, there are m samples in the training dataset {(sj, aj, Rj)}m j=1...Our goal is to learn the unknown reward function Rı of a new test task given a limited amount of expert demonstrations of the new test task, {(se i)}n i=1, (n << m).
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instances with specifications) were provided in the paper.
Software Dependencies	Yes	We use a Hamiltonian Monte Carlo algorithm (Team, 2011) (details in Appendix F, and Algorithm 1) which is suited to large parameter spaces.
Experiment Setup	Yes	We choose the bandwidth hyperparameters h, hÕ using rule-of-thumb procedures (Silverman, 1986). These procedures deﬁne the optimal bandwidth hyperparameters as the variance of the pairwise distance between the training data demonstrations and the training data reward functions respectively. ... AVRIL uses variational inference to approximate the posterior distribution on the reward function... AVRIL is initialized using an informative prior learned from the training tasks. ... p(R) = N(µ0, 2 0), where µ0 = 1 j=1 Rj(sj, aj) and 2 j=1(Rj(sj, aj) µ0)2.