reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Imitation under Misspecification

Authors: Nicolas Espinosa Dice, Sanjiban Choudhury, Wen Sun, Gokul Swamy

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we corroborate our theory by empirically investigating several potential sources of misspecification, showing that non-expert reset distributions are preferable under misspecification.
Researcher Affiliation	Academia	Department of Computer Science Cornell University EMAIL Gokul Swamy Robotics Institute Carnegie Mellon University EMAIL
Pseudocode	Yes	Algorithm 1 Reset-Based IRL (Dual, Swamy et al. (2023)) Algorithm 2 GUiding Imi Taters with Arbitrary Resets (GUITAR) Algorithm 3 Policy Search Via Dynamic Programming (Bagnell et al., 2003)
Open Source Code	Yes	We release a forked version of Ren et al. (2024) s code: https://nico-espinosadice.github.io/efficient-IRL/.
Open Datasets	Yes	For the two Antmaze-Large tasks, we use the data provided by Fu et al. (2020) as the expert demonstrations.
Dataset Splits	No	The paper describes collecting 100,000 state-action samples for expert and offline data and splitting expert data into 'short trajectories (of length less than 500)' and 'long trajectories (of length greater than or equal to 500)' for reset distributions. However, it does not specify explicit train/test/validation dataset splits with percentages, absolute counts, or references to standard splits for model evaluation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing instance types used for running the experiments. It mentions running tasks in a 'MuJoCo' environment but no associated hardware.
Software Dependencies	No	The paper mentions using 'Soft Actor Critic (Haarnoja et al., 2018) implementation provided by Raffin et al. (2021)', 'TD3+BC implementation of Fujimoto & Gu (2021)', and 'Optimistic Adam (Daskalakis et al., 2017)'. While specific implementations and algorithms are cited, the paper does not provide version numbers for these software libraries or environments (e.g., PyTorch version, Stable-Baselines3 version).
Experiment Setup	Yes	Table 1: Hyperparameters for baselines using SAC. PARAMETER VALUE BUFFER SIZE 1E6 BATCH SIZE 256 γ 0.98 τ 0.02 TRAINING FREQ. 64 GRADIENT STEPS 64 LEARNING RATE LIN. SCHED. 7.3E-4 POLICY ARCHITECTURE 256 X 2 STATE-DEPENDENT EXPLORATION TRUE TRAINING TIMESTEPS 1E6