reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sample-efficient Adversarial Imitation Learning

Authors: Dahuin Jung, Hyungyu Lee, Sungroh Yoon

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We theoretically and empirically observe that making an informative feature manifold with less sample complexity signiﬁcantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on Mu Jo Co in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
Researcher Affiliation	Academia	Electrical and Computer Engineering Seoul National University, Seoul 08826, Republic of Korea Electrical and Computer Engineering Interdisciplinary Program in Artiﬁcial Intelligence Seoul National University, Seoul 08826, Republic of Korea
Pseudocode	Yes	Algorithm 1 Sample-eﬃcient Adversarial Imitation Learning Algorithm 2 Pseudo-code of Ours + 2IWIL (Wu et al., 2019) Algorithm 3 Pseudo-code of Ours + 2IWIL + Manifold Mixup (Verma et al., 2019a)
Open Source Code	No	No explicit statement or link to the authors' own source code repository is provided in the paper. The text mentions that the code is based on PyTorch and Python libraries, but not that the authors' implementation is publicly released.
Open Datasets	Yes	The eﬃcacy of the proposed approach is assessed using Mu Jo Co (Todorov et al., 2012) and Atari RAM of Open AI Gym (Brockman et al., 2016), where each benchmark is allowed less than 100 expert state-action pairs.
Dataset Splits	Yes	We evaluated the proposed method on Mu Jo Co (Todorov et al., 2012) and Atari RAM of Open AI Gym (Brockman et al., 2016), where each benchmark is allowed less than 100 expert state-action pairs. ... We tested the sample eﬃciency of the proposed method in a scenario where optimal demonstration samples of less than one full trajectory are available ( <= 100). Expert demonstrations with optimalities of 25%, 50%, and 75% represent imperfect demonstrations a mixture of optimal and non-optimal demonstrations. ... Table 9: Speciﬁcation and the number of used demonstrations of each continuous control benchmark in the scenario of perfect expert demonstrations. ... Half Cheetah-v2 ... 100 ... Walker-v2 ... 100 ... Ant-v2 ... 100
Hardware Specification	Yes	For experimental settings, we used GTX 1080 Ti for GPUs, Intel i7-6850K for CPUs, and Ubuntu 18.04 for OS.
Software Dependencies	No	Our code is based on Pytorch (Paszke et al., 2019) and python libraries. While PyTorch is mentioned and cited, specific version numbers for PyTorch and Python are not provided, preventing full reproducibility of the software environment.
Experiment Setup	Yes	For hyperparameters in all runs, the total epoch for Swimmer, and Hopper is 3,000, for Beam Rider, Space Invaders, Half Cheetah, and Walker2d is 5,000, and for Ant is 8,000. Please refer to Table 8 for other hyperparameters. We set λF = 1, λS = 100, and λA = 1 for matching loss scale. Table 8: Base hyperparameters used for all benchmarks. γ 0.995 Generalized advantage estimation 0.97 N 5,000 Learning rate (all networks except for value network) 1e-3 Learning rate (value network) 3e-4 Batch size (RERP) 256 Batch size (TRPO) 128 Batch size (GAIL) 5,000 Optimizer (all networks) Adam τ 0.1 λF 1 λS 100 λA 1