SR-Reward: Taking The Path More Traveled

Authors: Seyed Mahdi B. Azad, Zahra Padar, Gabriel Kalweit, Joschka Boedecker

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments, achieving competitive results compared to offline RL algorithms with access to true rewards and imitation learning (IL) techniques like behavioral cloning. Code available at: https://github.com/Erfi/SR-Reward
Researcher Affiliation Academia Seyed Mahdi B. Azad EMAIL Department of Computer Science University of Freiburg; Zahra Padar EMAIL Department of Computer Science University of Freiburg; Gabriel Kalweit EMAIL Department of Computer Science University of Freiburg; Joschka Boedecker EMAIL Department of Computer Science University of Freiburg
Pseudocode Yes Algorithm 1 shows the pseudocode for training the SR-Reward and the offline RL in the same loop.
Open Source Code Yes Code available at: https://github.com/Erfi/SR-Reward
Open Datasets Yes We evaluate our method on D4RL as well as Maniskill Robot Manipulation environments... We use the offline datasets from D4RL (Fu et al., 2020), and follow their normalization procedures... For Maniskill2 environments (Pick Cube, Stack Cube, Turn Faucet)...
Dataset Splits No The paper uses well-known benchmark datasets (D4RL, Maniskill2) but does not explicitly describe how these datasets are split into training, validation, and test sets. It describes the evaluation protocol (e.g., '25 evaluation rollouts', '50 fresh rollouts') but not dataset splitting.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes Hyperparameters remain largely consistent across environments, with key parameters listed in Table 2 (Appendix D). Each agent is trained for one million gradient steps (two million gradient steps for Mani Skill environments), with five different random seeds used per task.