Skill Expansion and Composition in Parameter Space

Authors: Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, Xianyuan Zhan

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empowering diverse applications including multi-objective composition, dynamics shift, and continual policy shift, the results on D4RL, DSRL benchmarks, and the Deep Mind Control Suite show that PSEC exhibits superior capacity to leverage prior knowledge to efficiently tackle new challenges, as well as expand its skill libraries to evolve the capabilities. Project website: https://ltlhuuu.github.io/PSEC/.
Researcher Affiliation Academia 1 National University of Defense Technology, 2 Tsinghua University, 3 Shanghai Artificial Intelligence Laboratory, 4 Beijing Academy of Artificial Intelligence EMAIL,EMAIL, EMAIL,EMAIL
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes methodologies using mathematical equations and textual descriptions.
Open Source Code Yes Project website: https://ltlhuuu.github.io/PSEC/.
Open Datasets Yes Empowering diverse settings including multi-objective composition, continual policy shift and dynamics shift, PSEC demonstrates its capacity to evolve and effectively solve new tasks by leveraging prior knowledge, evaluated on the D4RL (Fu et al., 2020), DSRL (Liu et al., 2023a) and Deep Mind Control Suite (Tassa et al., 2018), showcasing significant potential for real-world applications.
Dataset Splits Yes We use three expert datasets including walker-stand DT0 e , walker-walk DT1 e , and walker-run DT2 e , released by Bai et al. (2024) for the policy learning. Specifically, DT0 e , DT1 e and DT2 e contains 1000, 10 and 10 trajectories, respectively.
Hardware Specification No The paper mentions training steps and batch sizes, e.g., "We train π0 for 1M gradient steps with a batch size of 2048", but does not specify any particular hardware components like GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using "Optimizer Adam (Kingma & Ba, 2014)", but it does not specify any software libraries or frameworks with version numbers that would be needed for reproduction.
Experiment Setup Yes Table 9: Hyperparameters for multi-objective composition tasks. Hyper-parameters Value shared hyperparameters Normalized state True Target update rate 1e-3 Expectile τ 0.9 Discount γ 0.99 Actor learning rate 3e-4 Critic learning rate 3e-4 Number of added Gaussian noise T 5 hidden dim 256 hidden layers 2 activation function Re LU Mini-batch size 2048 Optimizer Adam (Kingma & Ba, 2014) Training steps 1e6