reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Active Imitation Learning with Random Network Distillation

Authors: Emilien Biré, Anthony Kobanda, Ludovic Denoyer, Rémy Portelas

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our main contributions are threefold: i) We propose a new method called RND-DAgger, a novel interactive imitation learning approach leveraging state-based out-of-distribution identiﬁcation through random network distillation. ii) We perform a comparative analysis of RND-DAgger and existing methods on 3 tasks: a robotics scenario and two video-game environments. iii) Throughout these experiments, we demonstrate that RND-DAgger either outperforms or matches existing approaches in terms of ﬁnal performance while signiﬁcantly reducing expert burden.
Researcher Affiliation	Collaboration	Emilien Biré1 , Anthony Kobanda2, Ludovic Denoyer3, Rémy Portelas2 1Centrale Supelec 2Ubisoft La Forge 3H Company EMAIL
Pseudocode	Yes	Algorithm 1 DAgger Algorithm 2 Lazy/Ensemble DAgger Algorithm 3 Ensemble-DAgger s CONDITION Algorithm 4 Lazy-DAgger s CONDITION Algorithm 5 RND-DAgger
Open Source Code	Yes	To ensure the reproducibility of our work, we provide detailed pseudo-code in section 2 and section 3. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger.
Open Datasets	Yes	Our ﬁrst environment is Half Cheetah which is a classical reinforcement learning environment1 where the objective is to learn a running strategy for the agent. ...1https://github.com/arafﬁn/pybullet_envs_gymnasium We also propose and open-source two new environments developed for video game research . Race Car (see Figure 5) features a physics-based car controller... Finally, the 3D Maze environment allows us to study our strategy in goal-conditioned navigation scenarios. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger.
Dataset Splits	No	The paper mentions collecting an initial training set and then iteratively expanding it, but it does not specify explicit train/validation/test splits with percentages, counts, or methods for partitioning the data for evaluation purposes.
Hardware Specification	No	This work was granted access to the HPC resources of IDRIS under the allocation 2024AD011015218 made by GENCI.
Software Dependencies	No	The paper does not explicitly list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in the main text or appendices.
Experiment Setup	Yes	Hyperparameters For each decision rule, several key hyperparameters had to be tuned: DAgger The probability β of a frame to be controlled by the bot. ... RND-DAgger Threshold λ of OOD detection The historic context length... The Minimal Expert Time W The size of the random network... Ensemble-DAgger Threshold τ for discrepancy measure Threshold χ for doubt measure The number of models N Lazy-DAgger Threshold βH for discrepancy measure Threshold βR for the backward controlled loop ... The Table 2 summarizes the values used for our grid search.