reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Offline Hierarchical Reinforcement Learning via Inverse Optimization

Authors: Carolin Schmidt, Daniele Gammelli, James Harrison, Marco Pavone, Filipe Rodrigues

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed. Through experiments on robotic tasks, supply chain inventory control, and dynamic vehicle routing, we show how our framework substantially improves the performance of off-the-shelf offline learning algorithms across a diverse set of embodiments and policy structures, while providing the safety guarantees needed for safe, real-world deployment.
Researcher Affiliation	Collaboration	Carolin Schmidt1, Daniele Gammelli2, James Harrison3, Marco Pavone2, Filipe Rodrigues1 1Technical University of Denmark, 2Stanford University,3Google Deep Mind EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 OHIO: Offline Hierarchical Reinforcement Learning via Inverse Optimization
Open Source Code	Yes	Code and data are available at https://ohio-offline-hierarchical-rl.github.io
Open Datasets	Yes	Code and data are available at https://ohio-offline-hierarchical-rl.github.io
Dataset Splits	Yes	All datasets used for this experiment consist of 250 episodes of interactions (each consisting of 1000 timesteps). To learn the dynamics model, we use a train/val split of 0.9/0.1.
Hardware Specification	Yes	The training of our models was executed on a Tesla V100 16 GB GPU.
Software Dependencies	No	No specific software dependencies with version numbers are explicitly listed in the paper.
Experiment Setup	Yes	Table 6: Hyperparameters of SAC. Parameter Value Optimizer Adam Learning rate 1 10 3 Discount (γ) 0.97 Batch size 100 Entropy coefficient 0.3 Target smoothing coefficient (τ) 0.005 Target update interval 1 Gradient step/env.interaction 1