Efficient Skill Discovery via Regret-Aware Optimization

Authors: He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods.
Researcher Affiliation Academia 1Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou) 2Shanghai AI Lab 3Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Hong Kong SAR. Correspondence to: Ming Zhou <EMAIL>, Ying Sun <EMAIL>, Hui Xiong <EMAIL>.
Pseudocode Yes Algorithm 1 RSD
Open Source Code Yes Our code is open-source at https://github.com/ Zh He11/RSD.
Open Datasets Yes We compare our method with baselines on ant environment from dm control (Tunyasuvunakool et al., 2020), Maze2d-large, Antmaze-medium, and Antmaze-large from D4RL (Fu et al., 2020).
Dataset Splits No The paper uses standard reinforcement learning environments (dm control, D4RL) where data is generated through interaction. It does not provide explicit training/test/validation splits for a fixed dataset, but rather details of how trajectories are collected and sampled from a buffer during online learning.
Hardware Specification Yes All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1.
Software Dependencies Yes All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1.
Experiment Setup Yes C.1. Parameter Setting HYPERPARAMETER VALUE MAX PATH LENGTH 300 TRAJECTORY BATCH SIZE 16 SAC MAX BUFFER SIZE 3000000 OPTION DIM 2 LEARNING RATE (common) 0.0001 LEARNING RATE (ϕ) 0.001 MODEL LAYER 2 MODEL DIM 1024 DISCOUNT FACTOR 0.99 BATCH SIZE 1024 DUAL SLACK 0.001 α1 (πθ2) 5 α2 (πθ2) 1 MAX SIZE (Pz) l 15 STEPS IN EACH STAGE 50