Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning
Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on diverse robotic manipulation tasks within the Meta World benchmark demonstrate the effectiveness and versatility of GO-Skill. |
| Researcher Affiliation | Collaboration | 1C2DL, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tencent AI Lab 4Tsinghua University 5Ai Ri A. Correspondence to: Kai Li <EMAIL>, Junliang Xing <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 GO-Skill Extraction and Enhancement Algorithm 2 GO-Skill Policy Learning |
| Open Source Code | No | The paper states: "We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework.", with footnote 2 pointing to https://github.com/mxu34/prompt-dt. This refers to a third-party codebase used as a baseline or framework, not the authors' own implementation of GO-Skill. There is no explicit statement or link indicating that the source code for GO-Skill is provided. |
| Open Datasets | Yes | Our experiments are evaluated on the Meta World benchmark (Yu et al., 2020b), an MTRL benchmark consisting of 50 robotic manipulation tasks... the dataset are sourced from SAC-Replay (Haarnoja et al., 2018) ranging from random to expert experiences |
| Dataset Splits | Yes | We define two distinct dataset settings: (1) Near-Optimal, which includes the complete experience (100M transitions) from random to expert-level performance in SAC-Replay, and (2) Sub-Optimal, which contains the first 50% (50M transitions) of the near-optimal dataset for each task... We experiment with three different setups: (1) MT50, which includes the full set of 50 tasks, (2) MT30, a subset containing 30 operational tasks from MT50, and (3) ML45, where 45 tasks are used for pre-training and the remaining 5 tasks are used for fine-tuning evaluation. |
| Hardware Specification | Yes | We use NVIDIA Geforce RTX 3090 GPU for training and AMD EPYC 7742 64-Core Processor for evaluation with the environments. |
| Software Dependencies | Yes | We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework... For the offline dataset, we follow the approach outlined by He et al. (2023), using the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018)... our experiments are conducted on the stable version, Meta World-V2 1. |
| Experiment Setup | Yes | We present the common hyper-parameters in Table 4 and the additional hyper-parameters for GO-Skill in Table 5. Notably, the total number of iterations for the baselines is 1e5. GO-Skill first performs skill extraction using 3e4 iterations, followed by parallel iterations of skill enhancement and policy learning for 7e4. |