Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning
Authors: Remo Sasso, Matthia Sabatelli, Marco A. Wiering
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We support our claims with extensive and challenging cross-domain experiments for visual control. The proposed methods are extensively evaluated in challenging cross-domain transfer learning experiments, demonstrating resilience to differences in state-action spaces and reward functions. |
| Researcher Affiliation | Academia | Remo Sasso EMAIL Queen Mary University of London Matthia Sabatelli EMAIL University of Groningen Marco A. Wiering EMAIL University of Groningen |
| Pseudocode | Yes | See Appendix F for further details and pseudocode of the Dreamer algorithm. Algorithm 1 presents pseudocode of the original Dreamer algorithm (adapted from Hafner et al. (2020)) with seed episodes S, collect interval C, batch size B, sequence length L, imagination horizon H, and learning rate α. |
| Open Source Code | No | This work builds upon the code base of Dreamer: https://github.com/danijar/dreamer. The paper does not explicitly state that the authors' specific implementation of the proposed techniques is open-source or provide a link to their own repository. |
| Open Datasets | Yes | In this paper locomotion and pendulum balancing environments from Py Bullet (Coumans & Bai, 2016 2021) are used for experiments. The locomotion environments have as goal to walk to a target point that is distanced 1 kilometer away from the starting position as quickly as possible. Each environment has a different entity with different numbers of limbs and therefore has different state-action spaces, and transition functions. |
| Dataset Splits | Yes | For each method, we run multi-source transfer learning experiments using a different set of Nsource tasks for each of the target environments (Appendix B). The selection of source tasks for a given set was done such that each source set for a given target environment includes at least one task from a different environment type, i.e., a pendulum task for a locomotion task and vice versa. Similarly, each source set contains at least one task of the same environment type. We also ran preliminary experiments (3 random seeds) for sets consisting of N= [2, 3] to observe potential performance differences that result from different N, but we found no significant differences (Appendix C). Table 3: Source-target combinations used for FTL and MMTL experiments, using environments Half Cheetah (Cheetah), Hopper, Walker2D, Inverted Pendulum Swingup (Inv Pend), Inverted Double Pendulum Swingup (Inv Db Pend), and Ant. |
| Hardware Specification | Yes | We used a single Nvidia V100 GPU for each training run, taking about 6 hours per 1 million environment steps. |
| Software Dependencies | No | The paper mentions building upon the Dreamer codebase and using PyBullet environments, but it does not specify version numbers for these or other software dependencies. |
| Experiment Setup | Yes | For each run, we train the FTL and MMTL target agents for 1 million environment steps and compare them to a baseline Dreamer agent that learns from scratch for 1 million environment steps, in order to evaluate the sample efficiency gains of the transfer learning approaches. FTL is evaluated by training multi-task agents for 2 million environment steps for a single run, after which we transfer the parameters to the target agent as described in Section 4.1. We use a fraction of λ= 0.2 for FTL, as we observed in preliminary experiments, the largest performance gains occur in a range of λ [0.1, 0.3] (see Appendix E). For creating the UFS for MMTL, we train a multi-task agent on the Hopper and Inverted Pendulum task for 2 million environment steps. |