Efficient Imitation under Misspecification
Authors: Nicolas Espinosa Dice, Sanjiban Choudhury, Wen Sun, Gokul Swamy
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we corroborate our theory by empirically investigating several potential sources of misspecification, showing that non-expert reset distributions are preferable under misspecification. |
| Researcher Affiliation | Academia | Department of Computer Science Cornell University EMAIL Gokul Swamy Robotics Institute Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Reset-Based IRL (Dual, Swamy et al. (2023)) Algorithm 2 GUiding Imi Taters with Arbitrary Resets (GUITAR) Algorithm 3 Policy Search Via Dynamic Programming (Bagnell et al., 2003) |
| Open Source Code | Yes | We release a forked version of Ren et al. (2024) s code: https://nico-espinosadice.github.io/efficient-IRL/. |
| Open Datasets | Yes | For the two Antmaze-Large tasks, we use the data provided by Fu et al. (2020) as the expert demonstrations. |
| Dataset Splits | No | The paper describes collecting 100,000 state-action samples for expert and offline data and splitting expert data into 'short trajectories (of length less than 500)' and 'long trajectories (of length greater than or equal to 500)' for reset distributions. However, it does not specify explicit train/test/validation dataset splits with percentages, absolute counts, or references to standard splits for model evaluation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It mentions running tasks in a 'MuJoCo' environment but no associated hardware. |
| Software Dependencies | No | The paper mentions using 'Soft Actor Critic (Haarnoja et al., 2018) implementation provided by Raffin et al. (2021)', 'TD3+BC implementation of Fujimoto & Gu (2021)', and 'Optimistic Adam (Daskalakis et al., 2017)'. While specific implementations and algorithms are cited, the paper does not provide version numbers for these software libraries or environments (e.g., PyTorch version, Stable-Baselines3 version). |
| Experiment Setup | Yes | Table 1: Hyperparameters for baselines using SAC. PARAMETER VALUE BUFFER SIZE 1E6 BATCH SIZE 256 γ 0.98 τ 0.02 TRAINING FREQ. 64 GRADIENT STEPS 64 LEARNING RATE LIN. SCHED. 7.3E-4 POLICY ARCHITECTURE 256 X 2 STATE-DEPENDENT EXPLORATION TRUE TRAINING TIMESTEPS 1E6 |