Efficient Active Imitation Learning with Random Network Distillation
Authors: Emilien Biré, Anthony Kobanda, Ludovic Denoyer, Rémy Portelas
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main contributions are threefold: i) We propose a new method called RND-DAgger, a novel interactive imitation learning approach leveraging state-based out-of-distribution identification through random network distillation. ii) We perform a comparative analysis of RND-DAgger and existing methods on 3 tasks: a robotics scenario and two video-game environments. iii) Throughout these experiments, we demonstrate that RND-DAgger either outperforms or matches existing approaches in terms of final performance while significantly reducing expert burden. |
| Researcher Affiliation | Collaboration | Emilien Biré1 , Anthony Kobanda2, Ludovic Denoyer3, Rémy Portelas2 1Centrale Supelec 2Ubisoft La Forge 3H Company EMAIL |
| Pseudocode | Yes | Algorithm 1 DAgger Algorithm 2 Lazy/Ensemble DAgger Algorithm 3 Ensemble-DAgger s CONDITION Algorithm 4 Lazy-DAgger s CONDITION Algorithm 5 RND-DAgger |
| Open Source Code | Yes | To ensure the reproducibility of our work, we provide detailed pseudo-code in section 2 and section 3. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger. |
| Open Datasets | Yes | Our first environment is Half Cheetah which is a classical reinforcement learning environment1 where the objective is to learn a running strategy for the agent. ...1https://github.com/araffin/pybullet_envs_gymnasium We also propose and open-source two new environments developed for video game research . Race Car (see Figure 5) features a physics-based car controller... Finally, the 3D Maze environment allows us to study our strategy in goal-conditioned navigation scenarios. A comprehensive open-source codebase, including all environments, datasets, oracle model checkpoints, active learning algorithms, and a detailed guide on how to reproduce our experiments and results is available at https://sites.google.com/view/rnd-dagger. |
| Dataset Splits | No | The paper mentions collecting an initial training set and then iteratively expanding it, but it does not specify explicit train/validation/test splits with percentages, counts, or methods for partitioning the data for evaluation purposes. |
| Hardware Specification | No | This work was granted access to the HPC resources of IDRIS under the allocation 2024AD011015218 made by GENCI. |
| Software Dependencies | No | The paper does not explicitly list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) in the main text or appendices. |
| Experiment Setup | Yes | Hyperparameters For each decision rule, several key hyperparameters had to be tuned: DAgger The probability β of a frame to be controlled by the bot. ... RND-DAgger Threshold λ of OOD detection The historic context length... The Minimal Expert Time W The size of the random network... Ensemble-DAgger Threshold τ for discrepancy measure Threshold χ for doubt measure The number of models N Lazy-DAgger Threshold βH for discrepancy measure Threshold βR for the backward controlled loop ... The Table 2 summarizes the values used for our grid search. |