ODE-based Smoothing Neural Network for Reinforcement Learning Tasks
Authors: Yinuo Wang, Wenxuan Wang, Xujie Song, Tong Liu, Yuming Yin, Liangfa Chen, Likun Wang, Jingliang Duan, Shengbo Li
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Various experiments show that our Sm ODE network demonstrates superior anti-interference capabilities and smoother action outputs than the multilayer perceptron and smooth network architectures like Lips Net. The experimental results obtained from applying the proposed method are reported in Section 4. |
| Researcher Affiliation | Academia | Yinuo Wang1, Wenxuan Wang1, Xujie Song1, Tong Liu1, Yuming Yin2, Liangfa Chen3, Likun Wang1, Jingliang Duan1,3 , Shengbo Eben Li1 1 School of Vehicle and Mobility & College of AI, Tsinghua University 2 School of Mechanical Engineering, Zhejiang University of Technology 3 School of Mechanical Engineering, University of Science and Technology Beijing EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | The pseudocode of Sm ODE-based RL is illustrated in Algorithm 1. (Section 3.3) and Algorithm 1 Training method of Sm ODE-based RL. (Appendix B) |
| Open Source Code | Yes | To accelerate adoption and further research, we have encapsulated Sm ODE as a Py Torch module, with the code available in the attached files. |
| Open Datasets | Yes | In this study, ten types of experimental environments are adopted to validate the efficacy of the Sm ODE network: a vehicle trajectory tracking task, a linear quadratic regulator problem, and eight robotic control tasks in Mujoco Todorov et al. (2012). Mujoco is a benchmark RL environment that integrates several robot control tasks. The specific simulation tasks, depicted in Fig. 4, include Humanoid, Pusher, Hopper, Reacher, Walker2d, Ant, Inverted Double Pendulum and Car Racing. |
| Dataset Splits | No | The paper describes experimental environments and noise levels (e.g., Table 2), and mentions results are averaged over 'five seeds over 1 million training steps' (Section 4.3). However, it does not provide specific training/validation/test splits for fixed datasets, as is common in supervised learning. In reinforcement learning, data is dynamically generated through interaction with the environment. |
| Hardware Specification | Yes | All experiments were conducted on eight AMD Ryzen Threadripper 3960X 24-core processors with 128G of RAM each. |
| Software Dependencies | No | The paper mentions 'Sm ODE as a Py Torch module' (Section 1) and utilizes algorithms like 'INFADP' (Li (2023)) and 'DSAC' (Duan et al. (2021)), as well as 'Mujoco' (Todorov et al. (2012)) as an environment. However, it does not provide specific version numbers for PyTorch, GOPS, Mujoco, or any other software libraries or tools used, which is required for reproducibility. |
| Experiment Setup | Yes | F TRAINING DETAILS In Mujoco tasks, hyperparameters unrelated to Sm ODE were consistent with those in the DSAC paper. The parameters that needed adjustment were only λ1 and λ2, as well as the number of neurons in the three-layer network of the smooth ODE module. The variables λ1 and λ2 were adjusted using a controlled variable method to find the relatively optimal results. The configuration of the neuron numbers in the smooth ODE follows the rule that the number of neurons in the second and third layers equals the dimensionality of the environment actions, and the number of neurons in the first layer is greater than that of the latter two layers. (Section F) Also, Tables 8, 9, 10, and 11 provide detailed algorithm hyperparameters such as Replay buffer capacity, Batch size, Discount γ, Learning rates, and specific Weights λ1 and λ2. |