Sample-efficient Adversarial Imitation Learning
Authors: Dahuin Jung, Hyungyu Lee, Sungroh Yoon
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on Mu Jo Co in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors. |
| Researcher Affiliation | Academia | Electrical and Computer Engineering Seoul National University, Seoul 08826, Republic of Korea Electrical and Computer Engineering Interdisciplinary Program in Artificial Intelligence Seoul National University, Seoul 08826, Republic of Korea |
| Pseudocode | Yes | Algorithm 1 Sample-efficient Adversarial Imitation Learning Algorithm 2 Pseudo-code of Ours + 2IWIL (Wu et al., 2019) Algorithm 3 Pseudo-code of Ours + 2IWIL + Manifold Mixup (Verma et al., 2019a) |
| Open Source Code | No | No explicit statement or link to the authors' own source code repository is provided in the paper. The text mentions that the code is based on PyTorch and Python libraries, but not that the authors' implementation is publicly released. |
| Open Datasets | Yes | The efficacy of the proposed approach is assessed using Mu Jo Co (Todorov et al., 2012) and Atari RAM of Open AI Gym (Brockman et al., 2016), where each benchmark is allowed less than 100 expert state-action pairs. |
| Dataset Splits | Yes | We evaluated the proposed method on Mu Jo Co (Todorov et al., 2012) and Atari RAM of Open AI Gym (Brockman et al., 2016), where each benchmark is allowed less than 100 expert state-action pairs. ... We tested the sample efficiency of the proposed method in a scenario where optimal demonstration samples of less than one full trajectory are available ( <= 100). Expert demonstrations with optimalities of 25%, 50%, and 75% represent imperfect demonstrations a mixture of optimal and non-optimal demonstrations. ... Table 9: Specification and the number of used demonstrations of each continuous control benchmark in the scenario of perfect expert demonstrations. ... Half Cheetah-v2 ... 100 ... Walker-v2 ... 100 ... Ant-v2 ... 100 |
| Hardware Specification | Yes | For experimental settings, we used GTX 1080 Ti for GPUs, Intel i7-6850K for CPUs, and Ubuntu 18.04 for OS. |
| Software Dependencies | No | Our code is based on Pytorch (Paszke et al., 2019) and python libraries. While PyTorch is mentioned and cited, specific version numbers for PyTorch and Python are not provided, preventing full reproducibility of the software environment. |
| Experiment Setup | Yes | For hyperparameters in all runs, the total epoch for Swimmer, and Hopper is 3,000, for Beam Rider, Space Invaders, Half Cheetah, and Walker2d is 5,000, and for Ant is 8,000. Please refer to Table 8 for other hyperparameters. We set λF = 1, λS = 100, and λA = 1 for matching loss scale. Table 8: Base hyperparameters used for all benchmarks. γ 0.995 Generalized advantage estimation 0.97 N 5,000 Learning rate (all networks except for value network) 1e-3 Learning rate (value network) 3e-4 Batch size (RERP) 256 Batch size (TRPO) 128 Batch size (GAIL) 5,000 Optimizer (all networks) Adam τ 0.1 λF 1 λS 100 λA 1 |