Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Authors: Haoyuan Cai, Zhenghao Peng, Bolei Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to answer the following questions: (1) Does our algorithm require fewer expert demonstrations and efforts to learn a near-optimal policy than other interactive imitation learning methods? (2) Does the learned intervention criterion help the agent receive sufficient human guidance at safety-critical states, thereby capturing all necessary information for effectively imitating the expert? To investigate these questions, we conduct experiments on various reinforcement learning tasks with different state and action spaces. We consider the Meta Drive driving experiments (Li et al., 2022a) with continuous action spaces and Mini Grid Four Room task (Chevalier-Boisvert et al., 2018) with discrete action spaces.
Researcher Affiliation Academia 1Computer Science Department, University of California, Los Angeles (UCLA). Correspondence to: Bolei Zhou <EMAIL>.
Pseudocode Yes Algorithm 1 Human-Gated Interactive Imitation Learning
Open Source Code Yes Code and demo video are available at https: //github.com/metadriverse/AIM.
Open Datasets Yes We consider the Meta Drive driving experiments (Li et al., 2022a) with continuous action spaces and Mini Grid Four Room task (Chevalier-Boisvert et al., 2018) with discrete action spaces.
Dataset Splits Yes We evaluate the agent s learned policy in a held-out test environment separate from the training environments. ... The training split consists of 50 unique maps, and the test split contains another 50 distinct maps.
Hardware Specification No No specific hardware details (like GPU/CPU models) are mentioned in the paper related to the experiments.
Software Dependencies No The paper mentions environments like Meta Drive and Mini Grid, and algorithms like PPO-Lagrangian, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes C. Hyperparameters In Meta Drive, we implement the control policy and the proxy Q-network using a two-layer MLP, where each hidden layer has 256 units with Re LU activations. For Mini Grid tasks, all models employ a three-layer convolutional network with filter sizes of 16, 16, and 32. Each convolution uses a 2 2 kernel, and a max-pooling layer is inserted between the first and second convolutional layers. The Re LU activation is applied after each layer. When training AIM and the robot-gated baselines Ensemble-DAgger and Thrifty-DAgger, we use the same switch-to-agent threshold following Eq. 6. Table 4. AIM (Meta Drive) Hyper-parameter Value: Discounted Factor γ 0.99, Learning Rate 1e-4, Gradient Steps per Iteration 1, Train Batch Size 1024, Switch-to-Human Quantile δ 0.05.