reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

On Rollouts in Model-Based Reinforcement Learning

Authors: Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common Mu Jo Co benchmark tasks while substantially increasing rollout length and data quality.
Researcher Affiliation	Academia	Bernd Frauenknecht , Devdutt Subhasish , Friedrich Solowjow, and Sebastian Trimpe Institute for Data Science in Mechanical Engineering RWTH Aachen University Aachen, 52062, Germany EMAIL
Pseudocode	Yes	Pseudocode is provided in Algorithm 2 of Appendix C. Algorithm 1 Infoprop Algorithm 3 Infoprop-Dyna (Pseudocode adapted from Janner et al. (2019))
Open Source Code	Yes	2https://github.com/Data-Science-in-Mechanical-Engineering/infoprop
Open Datasets	Yes	To demonstrate the benefits of the Infoprop mechanism, we compare Infoprop-Dyna to state-of-the-art Dyna-style MBRL algorithms on Mu Jo Co Todorov et al. (2012) benchmark tasks.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits in terms of percentages, sample counts, or specific file names. It discusses environment and model replay buffers (Denv and Dmod) but not explicit data partitioning with specific ratios.
Hardware Specification	No	The paper states: "Further, the authors gratefully acknowledge the computing time provided to them at the NHR Center NHR4CES at RWTH Aachen University (project number p0022301)." This does not specify concrete hardware details such as specific GPU models, CPU models, or memory amounts.
Software Dependencies	No	The paper mentions "We used Weights&Biases 4 for logging our experiments" but does not specify a version number for Weights&Biases or any other software dependencies.
Experiment Setup	Yes	The respective hyperparameters for Infoprop-Dyna on Mu Jo Co are given below. Table 2 addresses model learning, Table 3 the Infoprop mechanism, and Table 4 training the model-free agent.