Consensus Based Stochastic Optimal Control
Authors: Liyao Lyu, Jingrun Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for extension to mean-field control problems. ... We evaluate the performance of the Adam-CBO method across various problem settings, including the linear quadratic control problem in 1, 2, 4, 8, and 16 dimensions, the Ginzburg-Landau model, and the systemic risk meanfield control problem with 50, 100, 200, 400, 800 agents. |
| Researcher Affiliation | Academia | 1Department of Computational Mathematics, Science & Engineering, Michigan State University, MI 48824, USA 2School of Mathematical Sciences and Suzhou Institute for Advanced Research, University of Science and Technology of China, and Suzhou Big Data & AI Research and Engineering Center, Suzhou 215127, China . |
| Pseudocode | Yes | Algorithm 1 Consensus Based Optimization with Momentum Algorithm 2 Consensus-based Optimization with Adaptive Momentum |
| Open Source Code | Yes | Our code is available at https://github.com/Lyuliyao/ADAM_CBO_control. |
| Open Datasets | Yes | We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1. |
| Dataset Splits | Yes | The control policy is initially trained using a delta distribution centered on x0 and n = 100 and then tested against different values of n = 50, 100, 200, 400, 800. Furthermore, the value function is evaluated by taking the expectation of controlled dynamics starting from different initial distributions µ0, including Gaussian random variable x0 = N(0, 0.1), mixture of two Gaussian random variables x0 = p( k+θy)+(1 P)(k+ θz) with P a Bernoulli random variable with parameter 1/3 10 , θ = 0.1, y, z N(0, 1) and mixture of three Gaussian random variables: x0 = [ k 3U =0 + k 3U =1] + θy with k = 0.3, θ = 0.07, y N(0, 1). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU or CPU models. It focuses on the algorithmic and software aspects of the experiments. |
| Software Dependencies | No | We also compare our method with DDPG, PPO, SAC, TD3, TQC, and Cross Q (using the stable-baselines3 implement https://github.com/araffin/sbx) on Pendulum-v1 as well as PPO and DQN on Cart Pole-v1. While 'stable-baselines3' is mentioned, specific version numbers for this or any other software dependencies are not provided. |
| Experiment Setup | Yes | In both methods, the number of SDE to compute the value function is 64 and the learning rate is 1e-2. In M-CBO and Adam-CBO methods, the number of agents is specified as N = 5000, and M = 50 agents are randomly selected to update in each step. We investigate the LQG problem in dimension d = 1, 2, 4, 8, and 16, with a terminal time of T = 1 and a timestep of T/20. ... We start with a simple case with d = 2, µ = 10, λ = 0.2. ... We test the performance of our method with parameters c = 2, k = 0.6, and η = 2. |