On Rollouts in Model-Based Reinforcement Learning
Authors: Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the capabilities of Infoprop in the Infoprop-Dyna algorithm, reporting state-of-the-art performance in Dyna-style MBRL on common Mu Jo Co benchmark tasks while substantially increasing rollout length and data quality. |
| Researcher Affiliation | Academia | Bernd Frauenknecht , Devdutt Subhasish , Friedrich Solowjow, and Sebastian Trimpe Institute for Data Science in Mechanical Engineering RWTH Aachen University Aachen, 52062, Germany EMAIL |
| Pseudocode | Yes | Pseudocode is provided in Algorithm 2 of Appendix C. Algorithm 1 Infoprop Algorithm 3 Infoprop-Dyna (Pseudocode adapted from Janner et al. (2019)) |
| Open Source Code | Yes | 2https://github.com/Data-Science-in-Mechanical-Engineering/infoprop |
| Open Datasets | Yes | To demonstrate the benefits of the Infoprop mechanism, we compare Infoprop-Dyna to state-of-the-art Dyna-style MBRL algorithms on Mu Jo Co Todorov et al. (2012) benchmark tasks. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits in terms of percentages, sample counts, or specific file names. It discusses environment and model replay buffers (Denv and Dmod) but not explicit data partitioning with specific ratios. |
| Hardware Specification | No | The paper states: "Further, the authors gratefully acknowledge the computing time provided to them at the NHR Center NHR4CES at RWTH Aachen University (project number p0022301)." This does not specify concrete hardware details such as specific GPU models, CPU models, or memory amounts. |
| Software Dependencies | No | The paper mentions "We used Weights&Biases 4 for logging our experiments" but does not specify a version number for Weights&Biases or any other software dependencies. |
| Experiment Setup | Yes | The respective hyperparameters for Infoprop-Dyna on Mu Jo Co are given below. Table 2 addresses model learning, Table 3 the Infoprop mechanism, and Table 4 training the model-free agent. |