reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning

Authors: Dongsu Lee, Minhae Kwon

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we assess Temp DATA, a suite of goal-oriented benchmarks, covering state-based domains, e.g., Ant Maze (Brockman et al., 2016), Kitchen (Gupta et al., 2019), CALVIN (Mees et al., 2022), and a pixel-based Kitchen variant. We confirm that our solution outperforms previous baselines. Furthermore, the proposed solution achieves better or comparable performance compared to prior goal-conditioned MFRL methods. To our knowledge, Temp DATA is the first offline MBRL approach to enable efficient transition augmentation for sparse-reward and long-horizon challenges. ... Section 5. Experiments: In our experiments, we evaluate Temp DATA s performance on diverse downstream tasks. ... Table 1. Evaluating Temp DATA (Proposed) on D4RL Ant Maze environment. ... Figure 8 presents an ablation study comparing four algorithmic variants on D4RL datasets.
Researcher Affiliation	Academia	This work was partly done at 1Carnegie Mellon University, Pittsburgh, USA 2Department of Intelligent Semiconductors, Soongsil University, Seoul, South Korea 3School of Electronic Engineering, Soongsil University, Seoul, South Korea. Correspondence to: Minhae Kwon <EMAIL>.
Pseudocode	Yes	Algorithm 1 Temp DATA with offline RL
Open Source Code	Yes	Our implementation for Temp DATA is based on Jax RL and is available at the following repository.
Open Datasets	Yes	Ant Maze is a widely-used benchmark environment (Brockman et al., 2016), where an 8-Do F Ant robot navigates to reach a given goal state from the initial one. We consider four different levels of this environment (i.e., Umaze, Medium, Large, and Ultra) and its dataset from D4RL benchmark (Fu et al., 2020). Kitchen is a realistic long-horizon benchmark environment (Gupta et al., 2019), where a 9Do F Franka robot manipulates four different sub-tasks (i.e., open a drawer, move a kettle, etc.). We also use the D4RL benchmark dataset... CALVIN another environment designed for long-horizon manipulation tasks (Mees et al., 2022)...
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits. It mentions using '50 test trials' for evaluation but does not detail how the data was partitioned into these splits, or if standard splits from D4RL were used without specifying them.
Hardware Specification	Yes	Our implementation for Temp DATA is based on Jax RL and is available at the following repository. We run our experiments on RTX 3090 GPUs.
Software Dependencies	No	The paper mentions that the implementation is based on 'Jax RL' and uses 'Adam' as an optimizer, but it does not specify version numbers for these or any other key software libraries or dependencies, such as Python version or specific JAX/TensorFlow/PyTorch versions.
Experiment Setup	Yes	C.2. Hyperparameters Hyperparameter Value Iterations 106 (state-based), 5 105 (pixel-based) Learning rate 3 10 4 (all networks) Optimizer Adam (Diederik, 2015) Batch size 512 (Ant Maze), 256 (Franka Kitchen), 128 (CALVIN) The number of evaluation episodes 50 (all tasks) Dimensions for autoencoder network [512, 512, 512, {32, 10}, 512, 512, 512] 32 (Ant Maze), 10 (CALVIN, Franka Kitchen) Discount factor for autoencoder 0.99 (all tasks) Expectile coefficient for Autoencoder 0.95 (Ant Maze), 0.97 (CALVIN, Franka Kitchen), 0.7 (Pixel-based) Dimensions for dynamic model network [512, 512, 512] The number of rollout steps 3 Dimensions for critic network [512, 512, 512] Dimensions for actor network [512, 512, 512] Target smoothing coefficient 5 10 3 Discount factor for offline RL 0.99 Inverse temperature for offline RL 10 (Ant Maze), 3 (CALVIN, Franka Kitchen) Expectile coefficient for offline RL 0.9 (state-based), 0.7 (pixel-based)