reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Authors: Wesley Suttle, Aamodh Suresh, Carlos Nieto-Granda

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally compare the performance of offline RL algorithms for a variety of downstream tasks on datasets generated using BE, R enyi, and Shannon entropy-maximizing policies, as well as the SMM and RND algorithms. We find that offline RL algorithms trained on datasets collected using BE outperform those trained on datasets collected using Shannon entropy, SMM, and RND on all tasks considered, and on 80% of the tasks compared to datasets collected using R enyi entropy.
Researcher Affiliation	Academia	Wesley A. Suttle , Aamodh Suresh , Carlos Nieto-Granda U.S. Army Research Laboratory Adelphi, MD 20783, USA EMAIL, EMAIL , EMAIL
Pseudocode	No	No explicit pseudocode or algorithm blocks are present in the main text of the paper. The methodology is described through mathematical derivations and textual explanations.
Open Source Code	No	The paper states it used "Unsupervised Reinforcement Learning Benchmark (URLB) framework (Laskin et al., 2021)2" and "Exploratory Data for Offline RL (Ex ORL) framework (Yarats et al., 2022)3" (with footnotes linking to GitHub repositories). However, it does not explicitly state that the authors' own implementation of the behavioral entropy methodology described in this paper is open-sourced or available.
Open Datasets	Yes	Using standard Mu Jo Co environments, we experimentally compare the performance of offline RL algorithms for a variety of downstream tasks on datasets generated using BE, R enyi, and Shannon entropy-maximizing policies, as well as the SMM and RND algorithms.
Dataset Splits	No	The paper mentions generating datasets with specific sizes, such as "500K elements" and comparing to "10M-element datasets". It also states "we performed just 100K offline training steps". However, it does not provide explicit training, validation, or test splits for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It mentions using "standard Mu Jo Co environments" but not the hardware on which these environments were simulated or the experiments conducted.
Software Dependencies	No	The paper mentions using the "Unsupervised Reinforcement Learning Benchmark (URLB) framework (Laskin et al., 2021)" and the "Exploratory Data for Offline RL (Ex ORL) framework (Yarats et al., 2022)". However, it does not specify version numbers for these frameworks or any other software libraries (e.g., Python, PyTorch, TensorFlow, CUDA) used in the experiments.
Experiment Setup	Yes	For our experiments, we generated BE, RE, SE, RND, and SMM datasets for the Walker and Quadruped environments using the Unsupervised Reinforcement Learning Benchmark (URLB) framework (Laskin et al., 2021)2. We subsequently generated t-SNE plots (Hinton & Roweis, 2002) and PHATE plots (Moon et al., 2019) from the BE, RE, SE, RND, and SMM datasets to visualize their varying state space coverage. Finally, we performed offline RL training on all datasets using the Exploratory Data for Offline RL (Ex ORL) framework (Yarats et0al., 2022)3. Table 2: Data generation hyperparameters Table 3: Offline RL hyperparameters