Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

Authors: Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1.
Researcher Affiliation Academia Arnav Kumar Jain1,2, Harley Wiltzer1,3, Jesse Farebrother1,3, Irina Rish1,2 Glen Berseth1,2 Sanjiban Choudhury4 1Mila Qu ebec AI Institute 2Universit e de Montr eal 3Mc Gill University 4Cornell University
Pseudocode Yes Algorithm 1 Successor Feature Matching (SFM) [...] Pseudo 1. SFM (TD7) Network Details [...] Pseudo 2. SFM (TD3) Network Details [...] Pseudo 3. SFM (Stochastic) Network Details [...] Pseudo 4. Base Feature Network Details
Open Source Code Yes Our codebase is available at https://github.com/arnavkj1995/SFM.
Open Datasets Yes Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1.
Dataset Splits Yes We evaluate our method 10 environments from the Deep Mind Control (DMC; Tunyasuvunakool et al., 2020) suite. [...] For each task, we collected expert demonstrations by training a TD3 (Fujimoto et al., 2018) agent for 1M environment steps. In our experiments, the agent is provided with a single expert demonstration which is kept fixed during the training phase.
Hardware Specification Yes Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU.
Software Dependencies No Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU. [...] Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. [...]. Not enough specific version numbers for software dependencies are provided. For example, 'Jax' is mentioned but without a version number specific to the implementation, and 'TD7' is an algorithm reference, not a software dependency.
Experiment Setup Yes In Table 5, we provide the details of the hyperparameters used for learning. Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. Important hyperparameters include the discount factor γ for the SF network and tuned it with values γ = [0.98, 0.99, 0.995] and report the ones that worked best in the table. Rest, our method was robust to hyperparameters like learning rate and batch-size used during training. [...] The agents are trained for 1M environment steps and we report the mean performance across 10 seeds with 95% confidence interval shading following the best practices in RLiable (Agarwal et al., 2021).