Temporal Distance-aware Transition Augmentation for Offline Model-based Reinforcement Learning
Authors: Dongsu Lee, Minhae Kwon
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we assess Temp DATA, a suite of goal-oriented benchmarks, covering state-based domains, e.g., Ant Maze (Brockman et al., 2016), Kitchen (Gupta et al., 2019), CALVIN (Mees et al., 2022), and a pixel-based Kitchen variant. We confirm that our solution outperforms previous baselines. Furthermore, the proposed solution achieves better or comparable performance compared to prior goal-conditioned MFRL methods. To our knowledge, Temp DATA is the first offline MBRL approach to enable efficient transition augmentation for sparse-reward and long-horizon challenges. ... Section 5. Experiments: In our experiments, we evaluate Temp DATA s performance on diverse downstream tasks. ... Table 1. Evaluating Temp DATA (Proposed) on D4RL Ant Maze environment. ... Figure 8 presents an ablation study comparing four algorithmic variants on D4RL datasets. |
| Researcher Affiliation | Academia | This work was partly done at 1Carnegie Mellon University, Pittsburgh, USA 2Department of Intelligent Semiconductors, Soongsil University, Seoul, South Korea 3School of Electronic Engineering, Soongsil University, Seoul, South Korea. Correspondence to: Minhae Kwon <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Temp DATA with offline RL |
| Open Source Code | Yes | Our implementation for Temp DATA is based on Jax RL and is available at the following repository. |
| Open Datasets | Yes | Ant Maze is a widely-used benchmark environment (Brockman et al., 2016), where an 8-Do F Ant robot navigates to reach a given goal state from the initial one. We consider four different levels of this environment (i.e., Umaze, Medium, Large, and Ultra) and its dataset from D4RL benchmark (Fu et al., 2020). Kitchen is a realistic long-horizon benchmark environment (Gupta et al., 2019), where a 9Do F Franka robot manipulates four different sub-tasks (i.e., open a drawer, move a kettle, etc.). We also use the D4RL benchmark dataset... CALVIN another environment designed for long-horizon manipulation tasks (Mees et al., 2022)... |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits. It mentions using '50 test trials' for evaluation but does not detail how the data was partitioned into these splits, or if standard splits from D4RL were used without specifying them. |
| Hardware Specification | Yes | Our implementation for Temp DATA is based on Jax RL and is available at the following repository. We run our experiments on RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions that the implementation is based on 'Jax RL' and uses 'Adam' as an optimizer, but it does not specify version numbers for these or any other key software libraries or dependencies, such as Python version or specific JAX/TensorFlow/PyTorch versions. |
| Experiment Setup | Yes | C.2. Hyperparameters Hyperparameter Value Iterations 106 (state-based), 5 105 (pixel-based) Learning rate 3 10 4 (all networks) Optimizer Adam (Diederik, 2015) Batch size 512 (Ant Maze), 256 (Franka Kitchen), 128 (CALVIN) The number of evaluation episodes 50 (all tasks) Dimensions for autoencoder network [512, 512, 512, {32, 10}, 512, 512, 512] 32 (Ant Maze), 10 (CALVIN, Franka Kitchen) Discount factor for autoencoder 0.99 (all tasks) Expectile coefficient for Autoencoder 0.95 (Ant Maze), 0.97 (CALVIN, Franka Kitchen), 0.7 (Pixel-based) Dimensions for dynamic model network [512, 512, 512] The number of rollout steps 3 Dimensions for critic network [512, 512, 512] Dimensions for actor network [512, 512, 512] Target smoothing coefficient 5 10 3 Discount factor for offline RL 0.99 Inverse temperature for offline RL 10 (Ant Maze), 3 (CALVIN, Franka Kitchen) Expectile coefficient for offline RL 0.9 (state-based), 0.7 (pixel-based) |