reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism

Authors: Haoyuan Cai, Zhenghao Peng, Bolei Zhou

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct experiments to answer the following questions: (1) Does our algorithm require fewer expert demonstrations and efforts to learn a near-optimal policy than other interactive imitation learning methods? (2) Does the learned intervention criterion help the agent receive sufficient human guidance at safety-critical states, thereby capturing all necessary information for effectively imitating the expert? To investigate these questions, we conduct experiments on various reinforcement learning tasks with different state and action spaces. We consider the Meta Drive driving experiments (Li et al., 2022a) with continuous action spaces and Mini Grid Four Room task (Chevalier-Boisvert et al., 2018) with discrete action spaces.
Researcher Affiliation	Academia	1Computer Science Department, University of California, Los Angeles (UCLA). Correspondence to: Bolei Zhou <EMAIL>.
Pseudocode	Yes	Algorithm 1 Human-Gated Interactive Imitation Learning
Open Source Code	Yes	Code and demo video are available at https: //github.com/metadriverse/AIM.
Open Datasets	Yes	We consider the Meta Drive driving experiments (Li et al., 2022a) with continuous action spaces and Mini Grid Four Room task (Chevalier-Boisvert et al., 2018) with discrete action spaces.
Dataset Splits	Yes	We evaluate the agent s learned policy in a held-out test environment separate from the training environments. ... The training split consists of 50 unique maps, and the test split contains another 50 distinct maps.
Hardware Specification	No	No specific hardware details (like GPU/CPU models) are mentioned in the paper related to the experiments.
Software Dependencies	No	The paper mentions environments like Meta Drive and Mini Grid, and algorithms like PPO-Lagrangian, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	C. Hyperparameters In Meta Drive, we implement the control policy and the proxy Q-network using a two-layer MLP, where each hidden layer has 256 units with Re LU activations. For Mini Grid tasks, all models employ a three-layer convolutional network with filter sizes of 16, 16, and 32. Each convolution uses a 2 2 kernel, and a max-pooling layer is inserted between the first and second convolutional layers. The Re LU activation is applied after each layer. When training AIM and the robot-gated baselines Ensemble-DAgger and Thrifty-DAgger, we use the same switch-to-agent threshold following Eq. 6. Table 4. AIM (Meta Drive) Hyper-parameter Value: Discounted Factor γ 0.99, Learning Rate 1e-4, Gradient Steps per Iteration 1, Train Batch Size 1024, Switch-to-Human Quantile δ 0.05.