reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Authors: Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1.
Researcher Affiliation	Academia	Arnav Kumar Jain1,2, Harley Wiltzer1,3, Jesse Farebrother1,3, Irina Rish1,2 Glen Berseth1,2 Sanjiban Choudhury4 1Mila Qu ebec AI Institute 2Universit e de Montr eal 3Mc Gill University 4Cornell University
Pseudocode	Yes	Algorithm 1 Successor Feature Matching (SFM) [...] Pseudo 1. SFM (TD7) Network Details [...] Pseudo 2. SFM (TD3) Network Details [...] Pseudo 3. SFM (Stochastic) Network Details [...] Pseudo 4. Base Feature Network Details
Open Source Code	Yes	Our codebase is available at https://github.com/arnavkj1995/SFM.
Open Datasets	Yes	Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1.
Dataset Splits	Yes	We evaluate our method 10 environments from the Deep Mind Control (DMC; Tunyasuvunakool et al., 2020) suite. [...] For each task, we collected expert demonstrations by training a TD3 (Fujimoto et al., 2018) agent for 1M environment steps. In our experiments, the agent is provided with a single expert demonstration which is kept fixed during the training phase.
Hardware Specification	Yes	Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU.
Software Dependencies	No	Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU. [...] Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. [...]. Not enough specific version numbers for software dependencies are provided. For example, 'Jax' is mentioned but without a version number specific to the implementation, and 'TD7' is an algorithm reference, not a software dependency.
Experiment Setup	Yes	In Table 5, we provide the details of the hyperparameters used for learning. Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. Important hyperparameters include the discount factor γ for the SF network and tuned it with values γ = [0.98, 0.99, 0.995] and report the ones that worked best in the table. Rest, our method was robust to hyperparameters like learning rate and batch-size used during training. [...] The agents are trained for 1M environment steps and we report the mean performance across 10 seeds with 95% confidence interval shading following the best practices in RLiable (Agarwal et al., 2021).