reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Novelty Detection in Reinforcement Learning with World Models

Authors: Geigh Zollicoffer, Kenneth Eaton, Jonathan C Balloch, Julia Kim, Wei Zhou, Robert Wright, Mark Riedl

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method by injecting novelties into Mini Grid (Chevalier-Boisvert et al., 2018), Atari (Machado et al., 2018), and continuous Deep Mind Control (DMC) (Tunyasuvunakool et al., 2020) environments. Specifically, we use the Nov Grid (Balloch et al., 2022), Hack Atari (Delfosse et al., 2024), and Real World RL Suite (Dulac-Arnold et al., 2020) that provide novelties to their respective base environments.
Researcher Affiliation	Academia	1Department of Mathematics, Georgia Institute of Technology, Atlanta, United States of America 2Georgia Tech Research Institute, Atlanta, United States of America 3Department of Computer Science, Georgia Institute of Technology, Atlanta, United States of America. Correspondence to: Mark Riedl <EMAIL>, Geigh Zollicoffer <EMAIL>.
Pseudocode	No	No explicit pseudocode or algorithm blocks are present. The methodology is described using equations and narrative text.
Open Source Code	No	The paper uses and references a third-party framework's code (Dreamer V2) by citing its GitHub repository in Table 9, but does not provide any statement or link for the open-sourcing of the authors' own novelty detection methodology described in this paper.
Open Datasets	Yes	We evaluate our method by injecting novelties into Mini Grid (Chevalier-Boisvert et al., 2018), Atari (Machado et al., 2018), and continuous Deep Mind Control (DMC) (Tunyasuvunakool et al., 2020) environments.
Dataset Splits	No	The paper describes the evaluation protocol where agents are trained in nominal environments and then tested in novel environments for specific numbers of episodes or steps (e.g., 'capturing 300 independent and identically distributed episodes' or 'capturing 50,000 steps within various independent and identically distributed episodes'). However, it does not provide specific training, validation, or test dataset splits in terms of percentages, sample counts, or predefined files for the underlying datasets themselves.
Hardware Specification	Yes	All experiments can be sufficiently reproduced utilizing a NVIDIA Ge Force GTX 1080 GPU with at least 8 GB of VRAM for environment complexity, a AMD Ryzen 5 5600X 6-Core Processor and at least 50 MB for files, excluding training data which is dependent on environment and model hyper-parameters.
Software Dependencies	No	The paper mentions using 'Dreamer V2' as a world model framework and references its GitHub repository for default training parameters, but it does not provide specific version numbers for Dreamer V2 itself or any other software dependencies like programming languages (e.g., Python), deep learning libraries (e.g., PyTorch, TensorFlow), or CUDA.
Experiment Setup	Yes	Appendix F.1 provides a table titled 'Dreamer World Model Training Parameters' which lists various hyperparameters such as 'Dataset size (FIFO)', 'Batch size B', 'Sequence length L', 'Discrete latent dimensions', 'KL loss scale β', 'World model learning rate', 'Imagination horizon H', 'Discount γ', 'Actor learning rate', and 'Critic learning rate' along with their specific numerical values.