Deep Implicit Imitation Reinforcement Learning in Heterogeneous Action Settings
Authors: Iason Chrysomallis, Georgios Chalkiadakis, Ioannis Papamichail, Markos Papageorgiou
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results confirm that the benefits associated with deep implicit imitation, namely accelerated training and improved performance over the mentor agent, are achieved even in contexts with non-homogeneous action settings. Our experiments are conducted both in a maze environment similar to those used for the study of tabular implicit imitation RL (Price and Boutilier 1999, 2001, 2003), and in a challenging forceactuated navigation environment (Fu et al. 2020). |
| Researcher Affiliation | Academia | Iason Chrysomallis, Georgios Chalkiadakis, Ioannis Papamichail, Markos Papageorgiou Technical University of Crete, Chania, Greece EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Deep k-n step repair |
| Open Source Code | No | The paper mentions a technical appendix for additional experiments but does not explicitly state that the source code for the methodology described in the paper is available or provide a link to a code repository. |
| Open Datasets | Yes | Specifically, we utilize the 2D Maze environment from Open AI Gym (Brockman et al. 2016), introducing a maze of size 30 30 where the state space comprises the agent s current position. Specifically, we employ D4RL s Point Maze (Fu et al. 2020), an environment based on the Mu Jo Co simulation physics engine (Todorov, Erez, and Tassa 2012) |
| Dataset Splits | No | The paper describes how episodes are defined and when an environment is considered solved, but it does not provide specific train/test/validation dataset splits for reproducibility. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using frameworks like DQN, OpenAI Gym, and MuJoCo, but does not provide specific version numbers for these or any other software libraries or programming languages used. |
| Experiment Setup | No | The paper mentions hyperparameters like 'einfeas' and search depths 'k' and 'n', but describes their selection as 'domain-specific' and suggests 'low values' without specifying the exact numerical values used in their experiments. It does not provide details on learning rates, batch sizes, optimizers, or training epochs. |