Efficient Skill Discovery via Regret-Aware Optimization
Authors: He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods. |
| Researcher Affiliation | Academia | 1Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou) 2Shanghai AI Lab 3Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Hong Kong SAR. Correspondence to: Ming Zhou <EMAIL>, Ying Sun <EMAIL>, Hui Xiong <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 RSD |
| Open Source Code | Yes | Our code is open-source at https://github.com/ Zh He11/RSD. |
| Open Datasets | Yes | We compare our method with baselines on ant environment from dm control (Tunyasuvunakool et al., 2020), Maze2d-large, Antmaze-medium, and Antmaze-large from D4RL (Fu et al., 2020). |
| Dataset Splits | No | The paper uses standard reinforcement learning environments (dm control, D4RL) where data is generated through interaction. It does not provide explicit training/test/validation splits for a fixed dataset, but rather details of how trajectories are collected and sampled from a buffer during online learning. |
| Hardware Specification | Yes | All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1. |
| Software Dependencies | Yes | All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1. |
| Experiment Setup | Yes | C.1. Parameter Setting HYPERPARAMETER VALUE MAX PATH LENGTH 300 TRAJECTORY BATCH SIZE 16 SAC MAX BUFFER SIZE 3000000 OPTION DIM 2 LEARNING RATE (common) 0.0001 LEARNING RATE (ϕ) 0.001 MODEL LAYER 2 MODEL DIM 1024 DISCOUNT FACTOR 0.99 BATCH SIZE 1024 DUAL SLACK 0.001 α1 (πθ2) 5 α2 (πθ2) 1 MAX SIZE (Pz) l 15 STEPS IN EACH STAGE 50 |