Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching
Authors: Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1. |
| Researcher Affiliation | Academia | Arnav Kumar Jain1,2, Harley Wiltzer1,3, Jesse Farebrother1,3, Irina Rish1,2 Glen Berseth1,2 Sanjiban Choudhury4 1Mila Qu ebec AI Institute 2Universit e de Montr eal 3Mc Gill University 4Cornell University |
| Pseudocode | Yes | Algorithm 1 Successor Feature Matching (SFM) [...] Pseudo 1. SFM (TD7) Network Details [...] Pseudo 2. SFM (TD3) Network Details [...] Pseudo 3. SFM (Stochastic) Network Details [...] Pseudo 4. Base Feature Network Details |
| Open Source Code | Yes | Our codebase is available at https://github.com/arnavkj1995/SFM. |
| Open Datasets | Yes | Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks. [...] As a result, SFM outperforms its competitors by 16% on mean normalized returns across a wide range of tasks from the DMControl suite (Tunyasuvunakool et al., 2020) as highlighted in Figure 1. |
| Dataset Splits | Yes | We evaluate our method 10 environments from the Deep Mind Control (DMC; Tunyasuvunakool et al., 2020) suite. [...] For each task, we collected expert demonstrations by training a TD3 (Fujimoto et al., 2018) agent for 1M environment steps. In our experiments, the agent is provided with a single expert demonstration which is kept fixed during the training phase. |
| Hardware Specification | Yes | Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU. |
| Software Dependencies | No | Our implementation of SFM is in Jax (Bradbury et al., 2018) and it takes about 2.5 hours for one run on a single NVIDIA A100 GPU. [...] Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. [...]. Not enough specific version numbers for software dependencies are provided. For example, 'Jax' is mentioned but without a version number specific to the implementation, and 'TD7' is an algorithm reference, not a software dependency. |
| Experiment Setup | Yes | In Table 5, we provide the details of the hyperparameters used for learning. Many of our hyperparamters are similar to the TD7 (Fujimoto et al., 2023) algorithm. Important hyperparameters include the discount factor γ for the SF network and tuned it with values γ = [0.98, 0.99, 0.995] and report the ones that worked best in the table. Rest, our method was robust to hyperparameters like learning rate and batch-size used during training. [...] The agents are trained for 1M environment steps and we report the mean performance across 10 seeds with 95% confidence interval shading following the best practices in RLiable (Agarwal et al., 2021). |