Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning
Authors: Seungho Baek, Taegeon Park, Jongchan Park, Seungjun Oh, Yusung Kim
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate GAS on OGBench (Park et al., 2025a) and D4RL (Fu et al., 2020), spanning diverse dataset types. We compare its performance against offline goal-conditioned and hierarchical baselines. For each dataset, we report the average normalized return across five test-time goals, except for kitchen, which uses a single fixed goal. Each goal is evaluated with 50 rollouts, and results are averaged over 4 random seeds. Bold numbers indicate results that are at least 95% of the best-performing method in each row. Details of the datasets and baselines are provided in Appendices C, D. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Sungkyunkwan University, Suwon, Republic of Korea 2Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Republic of Korea. Correspondence to: Yusung Kim <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Task Planning and Execution Algorithm 2 TD-Aware Graph Construction |
| Open Source Code | Yes | Our source code is available at: https: //github.com/qortmdgh4141/GAS. |
| Open Datasets | Yes | We evaluate GAS on OGBench (Park et al., 2025a) and D4RL (Fu et al., 2020), spanning diverse dataset types. |
| Dataset Splits | No | For each dataset, we report the average normalized return across five test-time goals, except for kitchen, which uses a single fixed goal. Each goal is evaluated with 50 rollouts, and results are averaged over 4 random seeds. We follow the goal specification protocol of OGBench (Park et al., 2025a), where each task provides five predefined state-goal pairs. |
| Hardware Specification | Yes | We run our experiments on an internal cluster consisting of RTX 3090 GPUs. |
| Software Dependencies | No | Our implementations of GAS and seven baselines are based on JAX (Bradbury et al., 2018). We apply layer normalization (Ba et al., 2016) to all MLP layers. For pixel-based environments, we adopt the Impala CNN (Espeholt et al., 2018) to process image inputs. Nonlinearity GELU (Hendrycks & Gimpel, 2016). Optimizer Adam (Kingma & Ba, 2015). |
| Experiment Setup | Yes | Table 7: Common hyperparameters used across all datasets. Table 8: Task-specific hyperparameters for each dataset. We provide a common list of hyperparameters in Table 7 and task-specific hyperparameters in Table 8. We apply layer normalization (Ba et al., 2016) to all MLP layers. For pixel-based environments, we adopt the Impala CNN (Espeholt et al., 2018) to process image inputs. While most components use 512-dimensional output features, we reduce the output dimension to 32 for the Temporal Distance Representation (TDR) to balance representational capacity and stability, as discussed in Appendix B. Following prior work (Park et al., 2023; 2024c; 2025a), we do not share encoders across components. As a result, in pixel-based environments, we use four separate CNN encoders for TDR, the Q-function, the value function, and the low-level policy. We also apply random crop augmentation (Kostrikov et al., 2021) with a probability of 0.5 to mitigate overfitting (Zheng et al., 2024). |