reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Language Guided Skill Discovery

Authors: Seungeun Rho, Laura Smith, Tianyu Li, Sergey Levine, Xue Bin Peng, Sehoon Ha

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our proposed LGSD by conducting a series of experiments on continuous control environments, encompassing both locomotion and manipulation setups. We aim to answer four questions: (1) Can prompting constrain the skill space into a desired semantic subspace? (2) Can language guidance lead to obtaining more diverse skills compared to unsupervised skill discovery baselines? (3) Can we utilize learned skills for solving downstream tasks? (4) Can we employ learned skills using natural language? Experimental setup We trained our algorithm and baselines using Isaac Gym (Makoviychuk et al., 2021), a high-throughput GPU-based physics simulator. For the language model, we employed gpt-4-turbo-2024-04-09(Achiam et al., 2023). We set the temperature parameter of the language model to 0 to get a consistent, low-variance measure of dlang. To reduce the number of unique queries, we discretized states and cached the input and output of these queries and reused them during training. We provide the exact prompts used for each experiments in Appendix G.
Researcher Affiliation	Academia	Seungeun Rho Georgia Institute of Technology EMAIL Laura Smith University of California, Berkley EMAIL Tianyu Li Georgia Institute of Technology EMAIL Sergey Levine University of California, Berkley EMAIL Xue Bin Peng Simon Fraser University EMAIL Sehoon Ha Georgia Institute of Technology EMAIL
Pseudocode	Yes	E FULL ALGORITHM OF LGSD Algorithm 1 Language Guided Skill Discovery 1: Initialize skill-conditioned policy π(a\|s, z), representation function ϕ(s), prompt lprompt, LLM function LLM, language embedding model fembed, skill inference network ψ, Lagrange multiplier λ, and data buffer D 2: for i 1 to # of epochs do 3: for j 1 to # of episodes per epoch do 4: Sample skill z N(0, I) 5: while episode not terminates do 6: Sample action a π(a\|s, z) 7: Execute a and receive s 8: Query LLM( \|s, lprompt) to produce ldesc(s) and ldesc(s ) 9: Compute reward r = (ϕ(s ) ϕ(s))T z 10: Compute dlang(s, s ) using eq. (2) 11: Compute embedding vector es = fembed(ldesc(s)) 12: Add {s, a, r, s , dlang(s, s ), es, z} to buffer D 13: end while 14: end for 15: for {s, a, r, s , dlang(s, s ), es, z} in D do 16: Update ϕ to maximize E(s,z,s ) D (ϕ(s ) ϕ(s))T z + λ min(ϵ, dlang(s, s ) ϕ(s) ϕ(s ) 2 2) 17: Update λ to minimize E(s,z,s ) D λ min(ϵ, dlang(s, s ) ϕ(s) ϕ(s ) 2 2) 18: Update π using PPO with reward r 19: Update ψ to minimize Mean Squared Error between ψ(es) and z 20: end for 21: end for
Open Source Code	No	Reproducibility Statement We have made significant efforts to ensure the reproducibility of our work across various aspects. A comprehensive pseudo-code of our algorithm is available in Appendix E.
Open Datasets	No	The paper uses environments like "Ant" and "Franka Cube" within the Isaac Gym simulator, which are standard for reinforcement learning. However, it does not explicitly state that any dataset generated from these experiments is publicly available, nor does it provide specific access information (links, DOIs, etc.) for any external datasets used, beyond referencing the simulator itself.
Dataset Splits	No	The paper describes experiments in reinforcement learning environments (Ant, Franka Cube) where agents interact with a simulated environment. This typically does not involve predefined training/test/validation splits of a static dataset in the traditional supervised learning sense. No explicit information on such splits is provided.
Hardware Specification	No	Experimental setup We trained our algorithm and baselines using Isaac Gym (Makoviychuk et al., 2021), a high-throughput GPU-based physics simulator.
Software Dependencies	No	For the language model, we employed gpt-4-turbo-2024-04-09(Achiam et al., 2023). We used PPO (Schulman et al., 2017) for as our primary RL algorithm. To measure the difference between two language descriptions, we leverage a pre-trained natural language embedding model, Sentence-Transformer (Reimers & Gurevych, 2019). Table 3 lists optimizers and activation functions such as Adam(Kingma & Ba, 2014), ELU(Clevert et al., 2015), and ReLU. While specific models like 'gpt-4-turbo-2024-04-09' are mentioned, the paper does not list multiple key software libraries (e.g., Python, PyTorch) with their specific version numbers.
Experiment Setup	Yes	Table 3: Hyperparameters of LGSD Name Value Learning rate 0.0001 Optimizer Adam(Kingma & Ba, 2014) Minibatch size 32768(Ant) , 16384(Franka) Horizon length 32 PPO clip threshold 0.2 PPO number of epochs 5 GAE λ (Schulman et al., 2015) 0.95 Discount factor γ 0.99 Entropy coefficient 0.0001 Initial Lagrange coefficient λ 300 Dim. of skill z 2(Ant), 3(Franka) Policy network π MLP with [256, 256, 128], Activaion of π ELU(Clevert et al., 2015) Representation function ϕ MLP with [256, 256, 128] Activaion of ϕ Re LU Skill inference network ψ MLP with [256, 256, 128] Activaion of ψ Re LU