Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning

Authors: Remo Sasso, Matthia Sabatelli, Marco A. Wiering

TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We support our claims with extensive and challenging cross-domain experiments for visual control. The proposed methods are extensively evaluated in challenging cross-domain transfer learning experiments, demonstrating resilience to differences in state-action spaces and reward functions.
Researcher Affiliation Academia Remo Sasso EMAIL Queen Mary University of London Matthia Sabatelli EMAIL University of Groningen Marco A. Wiering EMAIL University of Groningen
Pseudocode Yes See Appendix F for further details and pseudocode of the Dreamer algorithm. Algorithm 1 presents pseudocode of the original Dreamer algorithm (adapted from Hafner et al. (2020)) with seed episodes S, collect interval C, batch size B, sequence length L, imagination horizon H, and learning rate α.
Open Source Code No This work builds upon the code base of Dreamer: https://github.com/danijar/dreamer. The paper does not explicitly state that the authors' specific implementation of the proposed techniques is open-source or provide a link to their own repository.
Open Datasets Yes In this paper locomotion and pendulum balancing environments from Py Bullet (Coumans & Bai, 2016 2021) are used for experiments. The locomotion environments have as goal to walk to a target point that is distanced 1 kilometer away from the starting position as quickly as possible. Each environment has a different entity with different numbers of limbs and therefore has different state-action spaces, and transition functions.
Dataset Splits Yes For each method, we run multi-source transfer learning experiments using a different set of Nsource tasks for each of the target environments (Appendix B). The selection of source tasks for a given set was done such that each source set for a given target environment includes at least one task from a different environment type, i.e., a pendulum task for a locomotion task and vice versa. Similarly, each source set contains at least one task of the same environment type. We also ran preliminary experiments (3 random seeds) for sets consisting of N= [2, 3] to observe potential performance differences that result from different N, but we found no significant differences (Appendix C). Table 3: Source-target combinations used for FTL and MMTL experiments, using environments Half Cheetah (Cheetah), Hopper, Walker2D, Inverted Pendulum Swingup (Inv Pend), Inverted Double Pendulum Swingup (Inv Db Pend), and Ant.
Hardware Specification Yes We used a single Nvidia V100 GPU for each training run, taking about 6 hours per 1 million environment steps.
Software Dependencies No The paper mentions building upon the Dreamer codebase and using PyBullet environments, but it does not specify version numbers for these or other software dependencies.
Experiment Setup Yes For each run, we train the FTL and MMTL target agents for 1 million environment steps and compare them to a baseline Dreamer agent that learns from scratch for 1 million environment steps, in order to evaluate the sample efficiency gains of the transfer learning approaches. FTL is evaluated by training multi-task agents for 2 million environment steps for a single run, after which we transfer the parameters to the target agent as described in Section 4.1. We use a fraction of λ= 0.2 for FTL, as we observed in preliminary experiments, the largest performance gains occur in a range of λ [0.1, 0.3] (see Appendix E). For creating the UFS for MMTL, we train a multi-task agent on the Hopper and Inverted Pendulum task for 2 million environment steps.