Efficient Active Imitation Learning with Random Network Distillation

Authors: Emilien Biré, Anthony Kobanda, Ludovic Denoyer, Rémy Portelas

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our main contributions are threefold: i) We propose a new method called RND-DAgger, a novel interactive imitation learning approach leveraging state-based out-of-distribution identification through random network distillation. ii) We perform a comparative analysis of RND-DAgger and existing methods on 3 tasks: a robotics scenario and two video-game environments. iii) Throughout these experiments, we demonstrate that RND-DAgger either outperforms or matches existing approaches in terms of final performance while significantly reducing expert burden.
Researcher Affiliation Collaboration Emilien Biré1 , Anthony Kobanda2, Ludovic Denoyer3, Rémy Portelas2 1Centrale Supelec 2Ubisoft La Forge 3H Company EMAIL
Pseudocode Yes Algorithm 1 DAgger Algorithm 2 Lazy/Ensemble DAgger Algorithm 3 Ensemble-DAgger s CONDITION Algorithm 4 Lazy-DAgger s CONDITION Algorithm 5 RND-DAgger
Open Source Code Yes To ensure the reproducibility of our work, we provide detailed pseudo-code in section 2 and section 3. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger.
Open Datasets Yes Our first environment is Half Cheetah which is a classical reinforcement learning environment1 where the objective is to learn a running strategy for the agent. ...1https://github.com/araffin/pybullet_envs_gymnasium We also propose and open-source two new environments developed for video game research . Race Car (see Figure 5) features a physics-based car controller... Finally, the 3D Maze environment allows us to study our strategy in goal-conditioned navigation scenarios. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger.
Dataset Splits No The paper mentions collecting an initial training set and then iteratively expanding it, but it does not specify explicit train/validation/test splits with percentages, counts, or methods for partitioning the data for evaluation purposes.
Hardware Specification No This work was granted access to the HPC resources of IDRIS under the allocation 2024AD011015218 made by GENCI.
Software Dependencies No The paper does not explicitly list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in the main text or appendices.
Experiment Setup Yes Hyperparameters For each decision rule, several key hyperparameters had to be tuned: DAgger The probability β of a frame to be controlled by the bot. ... RND-DAgger Threshold λ of OOD detection The historic context length... The Minimal Expert Time W The size of the random network... Ensemble-DAgger Threshold τ for discrepancy measure Threshold χ for doubt measure The number of models N Lazy-DAgger Threshold βH for discrepancy measure Threshold βR for the backward controlled loop ... The Table 2 summarizes the values used for our grid search.