reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Control-Informed Learning

Authors: Zihao Liang, Tianyu Zhou, Zehui Lu, Shaoshuai Mou

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.
Researcher Affiliation	Academia	Zihao Liang EMAIL School of Aeronautics and Astronautics Purdue University Tianyu Zhou EMAIL School of Aeronautics and Astronautics Purdue University Zehui Lu EMAIL School of Aeronautics and Astronautics Purdue University Shaoshuai Mou EMAIL School of Aeronautics and Astronautics Purdue University
Pseudocode	Yes	Algorithm 1: Gradient Generator (GG) Algorithm 2: Online Control-Informed Learning
Open Source Code	No	The code is implemented in Python, utilizing the Cas ADi library with the IPOPT solver to solve the OC problem. While the implementation details are mentioned, there is no explicit statement about making the code open-source, nor a link to a repository.
Open Datasets	No	The paper describes generating its own data for experiments. For Online Imitation Learning, it states: "The dataset of expert demonstrations is generated by solving an optimal control system with the true dynamics and control objective parameter θ = {θdyn, θobj} given. We generate five trajectories with different initial conditions x0 and time horizons T." Similarly for Online System Identification: "we collect a total number of five trajectories from systems with dynamics known, wherein different trajectories ξo = {xo 0:T , u0:T 1} have different initial conditions x0 and horizons T..." No external public datasets are explicitly used or referenced with access information.
Dataset Splits	No	The paper describes how trajectories were generated (e.g., "five trajectories with different initial conditions") and mentions 'online phase' and 'offline phase' for learning. However, it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits in the traditional machine learning sense. The phases refer to the timing of data availability for learning, not a division of a fixed dataset for evaluation.
Hardware Specification	Yes	The experiments with OCIL were performed on a desktop with one Intel Core i7-8700k CPU with 8GB RAM. No GPU was used. The experiments with other methods were performed on a desktop with one AMD Ryzen 9 5900X CPU, one Nvidia Geforce RTX 4070ti, and 32 GB RAM.
Software Dependencies	No	The code is implemented in Python, utilizing the Cas ADi library with the IPOPT solver to solve the OC problem. While Python, CasADi, and IPOPT are mentioned, specific version numbers for these software components are not provided.
Experiment Setup	Yes	The learning rate is η = 10-4. Five trials were run given random initial θ0. For the neural dynamics case, the learning rate is η = 10-5. The neural network has a layer structure of (n + m)-2(n + m)-n with tanh activation functions, i.e., there is an input layer with (n + m) neurons equal to the dimension of state, one hidden layer with 2(n + m) neurons and one output layer with n neurons. For the neural policy cloning, we directly learn a neural network policy u = µ(x, θ) from the dataset using supervised learning... We use a fully connected feed-forward neural network that has a layer structure of 3n-3n-m with tanh activation functions, i.e., there is an input layer with 3n neurons equal to the dimension of state, one hidden layer with 3n neurons and one output layer with m neurons.