reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Skill Expansion and Composition in Parameter Space

Authors: Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, Xianyuan Zhan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empowering diverse applications including multi-objective composition, dynamics shift, and continual policy shift, the results on D4RL, DSRL benchmarks, and the Deep Mind Control Suite show that PSEC exhibits superior capacity to leverage prior knowledge to efficiently tackle new challenges, as well as expand its skill libraries to evolve the capabilities. Project website: https://ltlhuuu.github.io/PSEC/.
Researcher Affiliation	Academia	1 National University of Defense Technology, 2 Tsinghua University, 3 Shanghai Artificial Intelligence Laboratory, 4 Beijing Academy of Artificial Intelligence EMAIL,EMAIL, EMAIL,EMAIL
Pseudocode	No	The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes methodologies using mathematical equations and textual descriptions.
Open Source Code	Yes	Project website: https://ltlhuuu.github.io/PSEC/.
Open Datasets	Yes	Empowering diverse settings including multi-objective composition, continual policy shift and dynamics shift, PSEC demonstrates its capacity to evolve and effectively solve new tasks by leveraging prior knowledge, evaluated on the D4RL (Fu et al., 2020), DSRL (Liu et al., 2023a) and Deep Mind Control Suite (Tassa et al., 2018), showcasing significant potential for real-world applications.
Dataset Splits	Yes	We use three expert datasets including walker-stand DT0 e , walker-walk DT1 e , and walker-run DT2 e , released by Bai et al. (2024) for the policy learning. Specifically, DT0 e , DT1 e and DT2 e contains 1000, 10 and 10 trajectories, respectively.
Hardware Specification	No	The paper mentions training steps and batch sizes, e.g., "We train π0 for 1M gradient steps with a batch size of 2048", but does not specify any particular hardware components like GPU models, CPU types, or memory.
Software Dependencies	No	The paper mentions using "Optimizer Adam (Kingma & Ba, 2014)", but it does not specify any software libraries or frameworks with version numbers that would be needed for reproduction.
Experiment Setup	Yes	Table 9: Hyperparameters for multi-objective composition tasks. Hyper-parameters Value shared hyperparameters Normalized state True Target update rate 1e-3 Expectile τ 0.9 Discount γ 0.99 Actor learning rate 3e-4 Critic learning rate 3e-4 Number of added Gaussian noise T 5 hidden dim 256 hidden layers 2 activation function Re LU Mini-batch size 2048 Optimizer Adam (Kingma & Ba, 2014) Training steps 1e6