reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ODE-based Smoothing Neural Network for Reinforcement Learning Tasks

Authors: Yinuo Wang, Wenxuan Wang, Xujie Song, Tong Liu, Yuming Yin, Liangfa Chen, Likun Wang, Jingliang Duan, Shengbo Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Various experiments show that our Sm ODE network demonstrates superior anti-interference capabilities and smoother action outputs than the multilayer perceptron and smooth network architectures like Lips Net. The experimental results obtained from applying the proposed method are reported in Section 4.
Researcher Affiliation	Academia	Yinuo Wang1, Wenxuan Wang1, Xujie Song1, Tong Liu1, Yuming Yin2, Liangfa Chen3, Likun Wang1, Jingliang Duan1,3 , Shengbo Eben Li1 1 School of Vehicle and Mobility & College of AI, Tsinghua University 2 School of Mechanical Engineering, Zhejiang University of Technology 3 School of Mechanical Engineering, University of Science and Technology Beijing EMAIL, EMAIL, EMAIL
Pseudocode	Yes	The pseudocode of Sm ODE-based RL is illustrated in Algorithm 1. (Section 3.3) and Algorithm 1 Training method of Sm ODE-based RL. (Appendix B)
Open Source Code	Yes	To accelerate adoption and further research, we have encapsulated Sm ODE as a Py Torch module, with the code available in the attached files.
Open Datasets	Yes	In this study, ten types of experimental environments are adopted to validate the efficacy of the Sm ODE network: a vehicle trajectory tracking task, a linear quadratic regulator problem, and eight robotic control tasks in Mujoco Todorov et al. (2012). Mujoco is a benchmark RL environment that integrates several robot control tasks. The specific simulation tasks, depicted in Fig. 4, include Humanoid, Pusher, Hopper, Reacher, Walker2d, Ant, Inverted Double Pendulum and Car Racing.
Dataset Splits	No	The paper describes experimental environments and noise levels (e.g., Table 2), and mentions results are averaged over 'five seeds over 1 million training steps' (Section 4.3). However, it does not provide specific training/validation/test splits for fixed datasets, as is common in supervised learning. In reinforcement learning, data is dynamically generated through interaction with the environment.
Hardware Specification	Yes	All experiments were conducted on eight AMD Ryzen Threadripper 3960X 24-core processors with 128G of RAM each.
Software Dependencies	No	The paper mentions 'Sm ODE as a Py Torch module' (Section 1) and utilizes algorithms like 'INFADP' (Li (2023)) and 'DSAC' (Duan et al. (2021)), as well as 'Mujoco' (Todorov et al. (2012)) as an environment. However, it does not provide specific version numbers for PyTorch, GOPS, Mujoco, or any other software libraries or tools used, which is required for reproducibility.
Experiment Setup	Yes	F TRAINING DETAILS In Mujoco tasks, hyperparameters unrelated to Sm ODE were consistent with those in the DSAC paper. The parameters that needed adjustment were only λ1 and λ2, as well as the number of neurons in the three-layer network of the smooth ODE module. The variables λ1 and λ2 were adjusted using a controlled variable method to find the relatively optimal results. The configuration of the neuron numbers in the smooth ODE follows the rule that the number of neurons in the second and third layers equals the dimensionality of the environment actions, and the number of neurons in the first layer is greater than that of the latter two layers. (Section F) Also, Tables 8, 9, 10, and 11 provide detailed algorithm hyperparameters such as Replay buffer capacity, Batch size, Discount γ, Learning rates, and specific Weights λ1 and λ2.