Cyclophobic Reinforcement Learning
Authors: Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent s cropped observations we are able to achieve excellent results in the Mini Grid and Mini Hack environments. Both are particularly hard, as they require complex interactions with different objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample efficient than other state of the art methods in a variety of tasks. |
| Researcher Affiliation | Academia | Stefan Wagner1 Peter Arndt1 Jan Robine2 Stefan Harmeling2 1Heinrich Heine University Düsseldorf 2Technical University Dortmund EMAIL EMAIL |
| Pseudocode | No | The paper describes its methodology using textual explanations and mathematical equations (e.g., Equation 1, 2, 3, 4, 5, 11, 12, 13, 14), but it does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using the "open source xxhash library" for a specific function (hashing), but there is no explicit statement from the authors about releasing the source code for their own methodology described in the paper, nor is a link to a code repository provided. Furthermore, the paper mentions "In future work we seek to implement the hierarchical state representations for the neural agent" and "future work will therefore focus on incorporating cyclophobic reinforcement learning into a neural network based architecture," implying ongoing development and no immediate release of the full implementation. |
| Open Datasets | Yes | We test our method on the Mini Grid and Mini Hack environments: The Mini Grid environment (Chevalier-Boisvert et al., 2018) consists of a series of procedurally generated environments... The Mini Hack environment (Samvelyan et al., 2021) is a graphical version of the Net Hack environment (Küttler et al., 2020). |
| Dataset Splits | No | The paper uses procedurally generated reinforcement learning environments (Mini Grid and Mini Hack) and describes pretraining setups for transfer learning (e.g., "Door Key-8x8 and Multi Env"). However, it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts, which are typical for static datasets in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only describes the environments and experimental procedures. |
| Software Dependencies | No | The paper mentions using the "open source xxhash library" but does not provide a specific version number for this library or any other key software dependencies (e.g., Python, machine learning frameworks like PyTorch or TensorFlow, CUDA) that would be needed to replicate the experiment. |
| Experiment Setup | Yes | C.1 Hyperparameters: Common hyperparameters: In the following we present a table with all hyperparameters used for training the environments. In general, our method has very few hyperparameters. In this case the only really tunable hyperparameters are the step size η, the random action selection coefficient ϵ for epsilon-greedy exploration and the discount γ: η 0.2 γ 0.99 Num. views 5. Epsilon-greedy parameter ϵ: ... Intrinsic reward tradeoffρ: ... C.2 PPO Hyperparameters: GAE-lambda 0.95 γ 0.99 Batch size 256 Learning rate 0.001 Entropy coefficient 0.01 Value loss coefficient 0.5 Max Grad Normt 0.5 Clipping-ϵ 0.2 RMSProp-ϵ 1e-8 RMSProp-α 0.99 |