Skill Expansion and Composition in Parameter Space
Authors: Tenglong Liu, Jianxiong Li, Yinan Zheng, Haoyi Niu, Yixing Lan, Xin Xu, Xianyuan Zhan
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empowering diverse applications including multi-objective composition, dynamics shift, and continual policy shift, the results on D4RL, DSRL benchmarks, and the Deep Mind Control Suite show that PSEC exhibits superior capacity to leverage prior knowledge to efficiently tackle new challenges, as well as expand its skill libraries to evolve the capabilities. Project website: https://ltlhuuu.github.io/PSEC/. |
| Researcher Affiliation | Academia | 1 National University of Defense Technology, 2 Tsinghua University, 3 Shanghai Artificial Intelligence Laboratory, 4 Beijing Academy of Artificial Intelligence EMAIL,EMAIL, EMAIL,EMAIL |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes methodologies using mathematical equations and textual descriptions. |
| Open Source Code | Yes | Project website: https://ltlhuuu.github.io/PSEC/. |
| Open Datasets | Yes | Empowering diverse settings including multi-objective composition, continual policy shift and dynamics shift, PSEC demonstrates its capacity to evolve and effectively solve new tasks by leveraging prior knowledge, evaluated on the D4RL (Fu et al., 2020), DSRL (Liu et al., 2023a) and Deep Mind Control Suite (Tassa et al., 2018), showcasing significant potential for real-world applications. |
| Dataset Splits | Yes | We use three expert datasets including walker-stand DT0 e , walker-walk DT1 e , and walker-run DT2 e , released by Bai et al. (2024) for the policy learning. Specifically, DT0 e , DT1 e and DT2 e contains 1000, 10 and 10 trajectories, respectively. |
| Hardware Specification | No | The paper mentions training steps and batch sizes, e.g., "We train π0 for 1M gradient steps with a batch size of 2048", but does not specify any particular hardware components like GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions using "Optimizer Adam (Kingma & Ba, 2014)", but it does not specify any software libraries or frameworks with version numbers that would be needed for reproduction. |
| Experiment Setup | Yes | Table 9: Hyperparameters for multi-objective composition tasks. Hyper-parameters Value shared hyperparameters Normalized state True Target update rate 1e-3 Expectile τ 0.9 Discount γ 0.99 Actor learning rate 3e-4 Critic learning rate 3e-4 Number of added Gaussian noise T 5 hidden dim 256 hidden layers 2 activation function Re LU Mini-batch size 2048 Optimizer Adam (Kingma & Ba, 2014) Training steps 1e6 |