Skill Disentanglement in Reproducing Kernel Hilbert Space

Authors: Vedant Dave, Elmar Rueckert

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results on Unsupervised RL Benchmark show that HUSD outperforms previous exploration algorithms on state-based tasks. ... We validate our approach on maze tasks and the Unsupervised Reinforcement Learning Benchmark (URLB, Laskin et al. (2021)), demonstrating that HUSD is capable of learning a diverse and farreaching set of skills. ... Experiments In this section, we first conduct a qualitative analysis of the behaviors exhibited by different skills learned with HUSD and recent relevant competence-based methods ... Next, we perform unsupervised training of agents on Deepmind Control Suite (DMC) ... and then evaluate the adaptation efficiency of these learned skills in 12 downstream tasks using the Unsupervised Reinforcement Learning Benchmark (URLB) ... Ablation Study In this section, we see the effect of the λ parameter (weighing the MMD) on the actual results.
Researcher Affiliation Academia Cyber-Physical-Systems Lab, Montanuniversit at Leoben EMAIL, EMAIL
Pseudocode No The paper describes the methodology using mathematical formulations and descriptive text, but it does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code No The paper does not contain any explicit statement about releasing code or a link to a source-code repository for the methodology described.
Open Datasets Yes We validate our approach on maze tasks and the Unsupervised Reinforcement Learning Benchmark (URLB, Laskin et al. (2021)), demonstrating that HUSD is capable of learning a diverse and farreaching set of skills. ... we perform unsupervised training of agents on Deepmind Control Suite (DMC) (Tassa et al. 2018) and then evaluate the adaptation efficiency of these learned skills in 12 downstream tasks using the Unsupervised Reinforcement Learning Benchmark (URLB) (Laskin et al. 2021)
Dataset Splits No The paper discusses training steps, episodes, and the number of seeds for experiments (e.g., 'pretrained for 2M steps', 'finetuned for 100K steps', 'evaluated using 10 skills for 2500 episodes', 'evaluated 12 seeds for every task and method'), but it does not specify explicit training/test/validation dataset splits with percentages, sample counts, or citations to predefined data partitions for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using DDPG as the base RL algorithm and references other methods, but it does not specify any software dependencies with version numbers (e.g., Python version, specific library versions like PyTorch 1.x).
Experiment Setup No The paper mentions high-level experimental settings like 'pretrained for 2M steps' and 'finetuned for 100K steps', and refers to 'all other training parameters kept the same' and 'parameters of all the methods and other environments (Tree) are provided in the Supplementary Material', and 'A detailed description of these hyperparameters is provided in the Supplementary Material.' However, it explicitly defers the concrete hyperparameter values or system-level training settings to the supplementary material rather than including them in the main text.