reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Cyclophobic Reinforcement Learning

Authors: Stefan Sylvius Wagner, Peter Arndt, Jan Robine, Stefan Harmeling

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Augmenting the cyclophobic intrinsic reward with a sequence of hierarchical representations based on the agent s cropped observations we are able to achieve excellent results in the Mini Grid and Mini Hack environments. Both are particularly hard, as they require complex interactions with diﬀerent objects in order to be solved. Detailed comparisons with previous approaches and thorough ablation studies show that our newly proposed cyclophobic reinforcement learning is more sample eﬃcient than other state of the art methods in a variety of tasks.
Researcher Affiliation	Academia	Stefan Wagner1 Peter Arndt1 Jan Robine2 Stefan Harmeling2 1Heinrich Heine University Düsseldorf 2Technical University Dortmund EMAIL EMAIL
Pseudocode	No	The paper describes its methodology using textual explanations and mathematical equations (e.g., Equation 1, 2, 3, 4, 5, 11, 12, 13, 14), but it does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using the "open source xxhash library" for a specific function (hashing), but there is no explicit statement from the authors about releasing the source code for their own methodology described in the paper, nor is a link to a code repository provided. Furthermore, the paper mentions "In future work we seek to implement the hierarchical state representations for the neural agent" and "future work will therefore focus on incorporating cyclophobic reinforcement learning into a neural network based architecture," implying ongoing development and no immediate release of the full implementation.
Open Datasets	Yes	We test our method on the Mini Grid and Mini Hack environments: The Mini Grid environment (Chevalier-Boisvert et al., 2018) consists of a series of procedurally generated environments... The Mini Hack environment (Samvelyan et al., 2021) is a graphical version of the Net Hack environment (Küttler et al., 2020).
Dataset Splits	No	The paper uses procedurally generated reinforcement learning environments (Mini Grid and Mini Hack) and describes pretraining setups for transfer learning (e.g., "Door Key-8x8 and Multi Env"). However, it does not specify explicit training, validation, or test dataset splits in terms of percentages or sample counts, which are typical for static datasets in supervised learning contexts.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only describes the environments and experimental procedures.
Software Dependencies	No	The paper mentions using the "open source xxhash library" but does not provide a specific version number for this library or any other key software dependencies (e.g., Python, machine learning frameworks like PyTorch or TensorFlow, CUDA) that would be needed to replicate the experiment.
Experiment Setup	Yes	C.1 Hyperparameters: Common hyperparameters: In the following we present a table with all hyperparameters used for training the environments. In general, our method has very few hyperparameters. In this case the only really tunable hyperparameters are the step size η, the random action selection coeﬃcient ϵ for epsilon-greedy exploration and the discount γ: η 0.2 γ 0.99 Num. views 5. Epsilon-greedy parameter ϵ: ... Intrinsic reward tradeoﬀρ: ... C.2 PPO Hyperparameters: GAE-lambda 0.95 γ 0.99 Batch size 256 Learning rate 0.001 Entropy coeﬃcient 0.01 Value loss coeﬃcient 0.5 Max Grad Normt 0.5 Clipping-ϵ 0.2 RMSProp-ϵ 1e-8 RMSProp-α 0.99