reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dynamics Adapted Imitation Learning

Authors: Zixuan Liu, Liu Liu, Bingzhe Wu, Lanqing Li, Xueqian Wang, Bo Yuan, Peilin Zhao

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The experiment evaluation validates that our method achieves superior results on high dimensional continuous control tasks, compared to existing imitation learning methods. We validate the effectiveness of DYNAIL on a variety of high-dimensional continuous control benchmarks with dynamics variations. Section 5 and Appendix show that our algorithm achieves superior results compared to state-of-the-art imitation learning methods.
Researcher Affiliation	Collaboration	1Tsinghua University, 2Tecent AI Lab, 3Research Institute of Tsinghua University in Shenzhen, 4Zhejiang Lab
Pseudocode	Yes	Algorithm 1 Dynamics Adapted Imitation Learning (DYNAIL)
Open Source Code	Yes	To further demonstrate the efficacy of our methods, we provide experiment videos in https://github.com/Panda-Shawn/DYNAIL
Open Datasets	Yes	Custom ant is basically the same as ant from Open AI Gym (Brockman et al., 2016) except for joint gear ratios. With lower joint gear ratios, the robot flips less often and the agent learns fast. We refer this environment as Custom Ant-v0. Low Friction Quadruped. This environment is based on the source domain quadruped" with realwalk" task from realworldrl-suite (Dulac-Arnold et al., 2020).
Dataset Splits	No	We use 40 trajectories collected by expert as demonstrations. For all the experiments, we use the same pre-collected 40 expert trajectories on source domain (Custom Ant-v0) as expert demonstrations.
Hardware Specification	No	The paper does not provide specific hardware details for running the experiments.
Software Dependencies	No	We use PPO (Schulman et al., 2017) for the generator in AIL framework except for humanoid task where we use SAC (Haarnoja et al., 2018) for the generator, to optimize the policy and use 10 parallel environments to collect transitions on target domains. For all the experiments, the expert demonstrations are collected by using RL algorithms in Stable Baselines3 (Raffin et al., 2019).
Experiment Setup	Yes	The discriminator Dθ, classifiers qsa and qsas have the same structure of hidden layers, 2 layers of 256 units each, and a normalized input layer. We use Re LU as activation after each hidden layer. In all experiments, discounting factor is considered as 0.99. A key hyperparameter for our method is η, which serves as a tuning regularization. and we defer the full ablation study on η to the Appendix B.1. The hyperparameters are shown in Table 1.