Adaptation Augmented Model-based Policy Optimization

Authors: Jian Shen, Hang Lai, Minghuan Liu, Han Zhao, Yong Yu, Weinan Zhang

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on challenging continuous control tasks show that FAMPO and IAMPO, coupled with our model usage technique, achieves superior performance against baselines, which demonstrates the effectiveness of the proposed methods. Keywords: Model-based reinforcement learning, distribution shift, occupancy measure, Integral Probability Metric, importance sampling
Researcher Affiliation Academia Jian Shen EMAIL Hang Lai EMAIL Minghuan Liu EMAIL Han Zhao EMAIL Yong Yu EMAIL Weinan Zhang EMAIL Department of Computer Science, Shanghai Jiao Tong University Department of Computer Science, University of Illinois, Urbana-Champaign
Pseudocode Yes Algorithm 1 FAMPO ... Algorithm 2 IAMPO
Open Source Code No The paper states: "We implement all our experiments using Tensor Flow." However, it does not explicitly state that the code for the described methodology is released or provide a link to a code repository.
Open Datasets Yes We evaluate our methods and other baselines on six Mu Jo Co continuous control tasks from Open AI Gym (Brockman et al., 2016)
Dataset Splits No The paper describes dynamic data collection into environment and model buffers (Denv and Dmodel) and how samples are drawn from them for training. It does not provide specific fixed dataset splits (e.g., percentages or counts for training, validation, and testing sets) in the traditional supervised learning sense for reproducibility.
Hardware Specification No The paper mentions implementing experiments using Tensor Flow and using MuJoCo environments, but it does not specify any particular hardware components (e.g., GPU models, CPU types, or cloud computing specifications) used for conducting the experiments.
Software Dependencies No The paper states: "We implement all our experiments using Tensor Flow." While a software library is mentioned, a specific version number for Tensor Flow or any other software dependency is not provided.
Experiment Setup Yes Other important hyperparameters used in our methods are chosen by grid search and detailed hyperparameter settings can be found in Appendix E. Table 1: Common hyperparameters for FAMPO and IAMPO. Table 2: Distinct hyperparameters for FAMPO. Table 3: Distinct hyperparameters for IAMPO.