Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on diverse robotic manipulation tasks within the Meta World benchmark demonstrate the effectiveness and versatility of GO-Skill.
Researcher Affiliation Collaboration 1C2DL, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tencent AI Lab 4Tsinghua University 5Ai Ri A. Correspondence to: Kai Li <EMAIL>, Junliang Xing <EMAIL>.
Pseudocode Yes Algorithm 1 GO-Skill Extraction and Enhancement Algorithm 2 GO-Skill Policy Learning
Open Source Code No The paper states: "We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework.", with footnote 2 pointing to https://github.com/mxu34/prompt-dt. This refers to a third-party codebase used as a baseline or framework, not the authors' own implementation of GO-Skill. There is no explicit statement or link indicating that the source code for GO-Skill is provided.
Open Datasets Yes Our experiments are evaluated on the Meta World benchmark (Yu et al., 2020b), an MTRL benchmark consisting of 50 robotic manipulation tasks... the dataset are sourced from SAC-Replay (Haarnoja et al., 2018) ranging from random to expert experiences
Dataset Splits Yes We define two distinct dataset settings: (1) Near-Optimal, which includes the complete experience (100M transitions) from random to expert-level performance in SAC-Replay, and (2) Sub-Optimal, which contains the first 50% (50M transitions) of the near-optimal dataset for each task... We experiment with three different setups: (1) MT50, which includes the full set of 50 tasks, (2) MT30, a subset containing 30 operational tasks from MT50, and (3) ML45, where 45 tasks are used for pre-training and the remaining 5 tasks are used for fine-tuning evaluation.
Hardware Specification Yes We use NVIDIA Geforce RTX 3090 GPU for training and AMD EPYC 7742 64-Core Processor for evaluation with the environments.
Software Dependencies Yes We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework... For the offline dataset, we follow the approach outlined by He et al. (2023), using the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018)... our experiments are conducted on the stable version, Meta World-V2 1.
Experiment Setup Yes We present the common hyper-parameters in Table 4 and the additional hyper-parameters for GO-Skill in Table 5. Notably, the total number of iterations for the baselines is 1e5. GO-Skill first performs skill extraction using 3e4 iterations, followed by parallel iterations of skill enhancement and policy learning for 7e4.