Consensus Based Stochastic Optimal Control

Authors: Liyao Lyu, Jingrun Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. ... We evaluate the performance of the Adam-CBO method across various problem settings, including the linear quadratic control problem in 1, 2, 4, 8, and 16 dimensions, the Ginzburg-Landau model, and the systemic risk meanfield control problem with 50, 100, 200, 400, 800 agents.
Researcher Affiliation Academia 1Department of Computational Mathematics, Science & Engineering, Michigan State University, MI 48824, USA 2School of Mathematical Sciences and Suzhou Institute for Advanced Research, University of Science and Technology of China, and Suzhou Big Data & AI Research and Engineering Center, Suzhou 215127, China .
Pseudocode Yes Algorithm 1 Consensus Based Optimization with Momentum Algorithm 2 Consensus-based Optimization with Adaptive Momentum
Open Source Code Yes Our code is available at https://github.com/Lyuliyao/ADAM_CBO_control.
Open Datasets Yes We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1.
Dataset Splits Yes The control policy is initially trained using a delta distribution centered on x0 and n = 100 and then tested against different values of n = 50, 100, 200, 400, 800. Furthermore, the value function is evaluated by taking the expectation of controlled dynamics starting from different initial distributions µ0, including Gaussian random variable x0 = N(0, 0.1), mixture of two Gaussian random variables x0 = p( k+θy)+(1 P)(k+ θz) with P a Bernoulli random variable with parameter 1/3 10 , θ = 0.1, y, z N(0, 1) and mixture of three Gaussian random variables: x0 = [ k 3U =0 + k 3U =1] + θy with k = 0.3, θ = 0.07, y N(0, 1).
Hardware Specification No The paper does not provide specific details about the hardware used, such as GPU or CPU models. It focuses on the algorithmic and software aspects of the experiments.
Software Dependencies No We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1. While 'stable-baselines3' is mentioned, specific version numbers for this or any other software dependencies are not provided.
Experiment Setup Yes In both methods, the number of SDE to compute the value function is 64 and the learning rate is 1e-2. In M-CBO and Adam-CBO methods, the number of agents is specified as N = 5000, and M = 50 agents are randomly selected to update in each step. We investigate the LQG problem in dimension d = 1, 2, 4, 8, and 16, with a terminal time of T = 1 and a timestep of T/20. ... We start with a simple case with d = 2, µ = 10, λ = 0.2. ... We test the performance of our method with parameters c = 2, k = 0.6, and η = 2.