On Rollouts in Model-Based Reinforcement Learning

Authors: Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common Mu Jo Co benchmark tasks while substantially increasing rollout length and data quality.
Researcher Affiliation Academia Bernd Frauenknecht , Devdutt Subhasish , Friedrich Solowjow, and Sebastian Trimpe Institute for Data Science in Mechanical Engineering RWTH Aachen University Aachen, 52062, Germany EMAIL
Pseudocode Yes Pseudocode is provided in Algorithm 2 of Appendix C. Algorithm 1 Infoprop Algorithm 3 Infoprop-Dyna (Pseudocode adapted from Janner et al. (2019))
Open Source Code Yes 2https://github.com/Data-Science-in-Mechanical-Engineering/infoprop
Open Datasets Yes To demonstrate the benefits of the Infoprop mechanism, we compare Infoprop-Dyna to state-of-the-art Dyna-style MBRL algorithms on Mu Jo Co Todorov et al. (2012) benchmark tasks.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits in terms of percentages, sample counts, or specific file names. It discusses environment and model replay buffers (Denv and Dmod) but not explicit data partitioning with specific ratios.
Hardware Specification No The paper states: "Further, the authors gratefully acknowledge the computing time provided to them at the NHR Center NHR4CES at RWTH Aachen University (project number p0022301)." This does not specify concrete hardware details such as specific GPU models, CPU models, or memory amounts.
Software Dependencies No The paper mentions "We used Weights&Biases 4 for logging our experiments" but does not specify a version number for Weights&Biases or any other software dependencies.
Experiment Setup Yes The respective hyperparameters for Infoprop-Dyna on Mu Jo Co are given below. Table 2 addresses model learning, Table 3 the Infoprop mechanism, and Table 4 training the model-free agent.