reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Skill Discovery via Regret-Aware Optimization

Authors: He Zhang, Ming Zhou, Shaopeng Zhai, Ying Sun, Hui Xiong

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on environments with varying complexities and dimension sizes. Empirical results show that our method outperforms baselines in both efficiency and diversity. Moreover, our method achieves a 15% zero shot improvement in high-dimensional environments, compared to existing methods.
Researcher Affiliation	Academia	1Thrust of Artificial Intelligence, The Hong Kong University of Science and Technology (Guangzhou) 2Shanghai AI Lab 3Department of Computer Science and Engineering, The Hong Kong University of Science and Technology Hong Kong SAR. Correspondence to: Ming Zhou <EMAIL>, Ying Sun <EMAIL>, Hui Xiong <EMAIL>.
Pseudocode	Yes	Algorithm 1 RSD
Open Source Code	Yes	Our code is open-source at https://github.com/ Zh He11/RSD.
Open Datasets	Yes	We compare our method with baselines on ant environment from dm control (Tunyasuvunakool et al., 2020), Maze2d-large, Antmaze-medium, and Antmaze-large from D4RL (Fu et al., 2020).
Dataset Splits	No	The paper uses standard reinforcement learning environments (dm control, D4RL) where data is generated through interaction. It does not provide explicit training/test/validation splits for a fixed dataset, but rather details of how trajectories are collected and sampled from a buffer during online learning.
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1.
Software Dependencies	Yes	All experiments were conducted on a single NVIDIA A100 GPU with Py Torch 2.0.1.
Experiment Setup	Yes	C.1. Parameter Setting HYPERPARAMETER VALUE MAX PATH LENGTH 300 TRAJECTORY BATCH SIZE 16 SAC MAX BUFFER SIZE 3000000 OPTION DIM 2 LEARNING RATE (common) 0.0001 LEARNING RATE (ϕ) 0.001 MODEL LAYER 2 MODEL DIM 1024 DISCOUNT FACTOR 0.99 BATCH SIZE 1024 DUAL SLACK 0.001 α1 (πθ2) 5 α2 (πθ2) 1 MAX SIZE (Pz) l 15 STEPS IN EACH STAGE 50