reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

Authors: Jinmin He, Kai Li, Yifan Zang, Haobo Fu, Qiang Fu, Junliang Xing, Jian Cheng

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse robotic manipulation tasks within the Meta World benchmark demonstrate the effectiveness and versatility of GO-Skill.
Researcher Affiliation	Collaboration	1C2DL, Institute of Automation, Chinese Academy of Sciences 2School of Artificial Intelligence, University of Chinese Academy of Sciences 3Tencent AI Lab 4Tsinghua University 5Ai Ri A. Correspondence to: Kai Li <EMAIL>, Junliang Xing <EMAIL>.
Pseudocode	Yes	Algorithm 1 GO-Skill Extraction and Enhancement Algorithm 2 GO-Skill Policy Learning
Open Source Code	No	The paper states: "We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework.", with footnote 2 pointing to https://github.com/mxu34/prompt-dt. This refers to a third-party codebase used as a baseline or framework, not the authors' own implementation of GO-Skill. There is no explicit statement or link indicating that the source code for GO-Skill is provided.
Open Datasets	Yes	Our experiments are evaluated on the Meta World benchmark (Yu et al., 2020b), an MTRL benchmark consisting of 50 robotic manipulation tasks... the dataset are sourced from SAC-Replay (Haarnoja et al., 2018) ranging from random to expert experiences
Dataset Splits	Yes	We define two distinct dataset settings: (1) Near-Optimal, which includes the complete experience (100M transitions) from random to expert-level performance in SAC-Replay, and (2) Sub-Optimal, which contains the first 50% (50M transitions) of the near-optimal dataset for each task... We experiment with three different setups: (1) MT50, which includes the full set of 50 tasks, (2) MT30, a subset containing 30 operational tasks from MT50, and (3) ML45, where 45 tasks are used for pre-training and the remaining 5 tasks are used for fine-tuning evaluation.
Hardware Specification	Yes	We use NVIDIA Geforce RTX 3090 GPU for training and AMD EPYC 7742 64-Core Processor for evaluation with the environments.
Software Dependencies	Yes	We implement all experiments using the Prompt-DT (Xu et al., 2022) codebase2, and access the Meta World environment to this framework... For the offline dataset, we follow the approach outlined by He et al. (2023), using the Soft Actor-Critic (SAC) algorithm (Haarnoja et al., 2018)... our experiments are conducted on the stable version, Meta World-V2 1.
Experiment Setup	Yes	We present the common hyper-parameters in Table 4 and the additional hyper-parameters for GO-Skill in Table 5. Notably, the total number of iterations for the baselines is 1e5. GO-Skill first performs skill extraction using 3e4 iterations, followed by parallel iterations of skill enhancement and policy learning for 7e4.