Online Control-Informed Learning

Authors: Zihao Liang, Tianyu Zhou, Zehui Lu, Shaoshuai Mou

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Three learning modes of OCIL, i.e. Online Imitation Learning, Online System Identification, and Policy Tuning On-the-fly, are investigated via experiments, which validate their effectiveness.
Researcher Affiliation Academia Zihao Liang EMAIL School of Aeronautics and Astronautics Purdue University Tianyu Zhou EMAIL School of Aeronautics and Astronautics Purdue University Zehui Lu EMAIL School of Aeronautics and Astronautics Purdue University Shaoshuai Mou EMAIL School of Aeronautics and Astronautics Purdue University
Pseudocode Yes Algorithm 1: Gradient Generator (GG) Algorithm 2: Online Control-Informed Learning
Open Source Code No The code is implemented in Python, utilizing the Cas ADi library with the IPOPT solver to solve the OC problem. While the implementation details are mentioned, there is no explicit statement about making the code open-source, nor a link to a repository.
Open Datasets No The paper describes generating its own data for experiments. For Online Imitation Learning, it states: "The dataset of expert demonstrations is generated by solving an optimal control system with the true dynamics and control objective parameter θ = {θdyn, θobj} given. We generate five trajectories with different initial conditions x0 and time horizons T." Similarly for Online System Identification: "we collect a total number of five trajectories from systems with dynamics known, wherein different trajectories ξo = {xo 0:T , u0:T 1} have different initial conditions x0 and horizons T..." No external public datasets are explicitly used or referenced with access information.
Dataset Splits No The paper describes how trajectories were generated (e.g., "five trajectories with different initial conditions") and mentions 'online phase' and 'offline phase' for learning. However, it does not specify explicit training, validation, or test dataset splits with percentages, sample counts, or references to predefined splits in the traditional machine learning sense. The phases refer to the timing of data availability for learning, not a division of a fixed dataset for evaluation.
Hardware Specification Yes The experiments with OCIL were performed on a desktop with one Intel Core i7-8700k CPU with 8GB RAM. No GPU was used. The experiments with other methods were performed on a desktop with one AMD Ryzen 9 5900X CPU, one Nvidia Geforce RTX 4070ti, and 32 GB RAM.
Software Dependencies No The code is implemented in Python, utilizing the Cas ADi library with the IPOPT solver to solve the OC problem. While Python, CasADi, and IPOPT are mentioned, specific version numbers for these software components are not provided.
Experiment Setup Yes The learning rate is η = 10-4. Five trials were run given random initial θ0. For the neural dynamics case, the learning rate is η = 10-5. The neural network has a layer structure of (n + m)-2(n + m)-n with tanh activation functions, i.e., there is an input layer with (n + m) neurons equal to the dimension of state, one hidden layer with 2(n + m) neurons and one output layer with n neurons. For the neural policy cloning, we directly learn a neural network policy u = µ(x, θ) from the dataset using supervised learning... We use a fully connected feed-forward neural network that has a layer structure of 3n-3n-m with tanh activation functions, i.e., there is an input layer with 3n neurons equal to the dimension of state, one hidden layer with 3n neurons and one output layer with m neurons.