reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SR-Reward: Taking The Path More Traveled

Authors: Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Code available at: https://github.com/Erfi/SR-Reward
Researcher Affiliation	Academia	Seyed Mahdi B. Azad EMAIL Department of Computer Science University of Freiburg; Zahra Padar EMAIL Department of Computer Science University of Freiburg; Gabriel Kalweit EMAIL Department of Computer Science University of Freiburg; Joschka Boedecker EMAIL Department of Computer Science University of Freiburg
Pseudocode	Yes	Algorithm 1 shows the pseudocode for training the SR-Reward and the offline RL in the same loop.
Open Source Code	Yes	Code available at: https://github.com/Erfi/SR-Reward
Open Datasets	Yes	We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments... We use the offline datasets from D4RL (Fu et al., 2020), and follow their normalization procedures... For Maniskill2 environments (Pick Cube, Stack Cube, Turn Faucet)...
Dataset Splits	No	The paper uses well-known benchmark datasets (D4RL, Maniskill2) but does not explicitly describe how these datasets are split into training, validation, and test sets. It describes the evaluation protocol (e.g., '25 evaluation rollouts', '50 fresh rollouts') but not dataset splitting.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Hyperparameters remain largely consistent across environments, with key parameters listed in Table 2 (Appendix D). Each agent is trained for one million gradient steps (two million gradient steps for Mani Skill environments), with five different random seeds used per task.