Learning Safe Control via On-the-Fly Bandit Exploration
Authors: Alexandre Capone, Ryan Kazuo Cosner, Aaron Ames, Sandra Hirche
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We now showcase how our approach performs using two numerical simulations: a cruise control system and a quadrotor with ground dynamics. We additionally perform experiments that aim to answer the following questions: How does our method compare with random exploration during infeasibility? How does the choice of sampling time t affect safety? We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set. We report how often each method fails, i.e., leads to a positive value for the CBF during the simulation. The average number of failures for both settings is shown in Figure 5. |
| Researcher Affiliation | Academia | 1Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA 2 Department of Mechanical and Civil Engineering , California Institute of Technology, Pasadena, CA, USA 3TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany. Correspondence to: Alexandre Capone <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Safe Control via On-the-Fly Bandit Exploration Input: Sampling time t, GP prior, CBF h, class-KL function α 1: Set EXPLORE = FALSE 2: for t [0, ) do 3: if EXPLORE==FALSE then 4: if maxu U LCBN(x, u) > α(h(x)) + ϵ 2 then 5: Solve (12) and apply πN,safe(x) 6: else 7: Set EXPLORE=TRUE 8: Set N = N + 1. 9: Set t N = t. 10: Set x(N) = x. 11: Compute u(N) by solving (14). 12: end if 13: end if 14: if EXPLORE==TRUE then 15: if t < t N + t then 16: Apply a locally Lipschitz controller π with π(x(N)) = u(N). 17: Collect noisy measurement y(N) = x(t N) ˆf (x(t N)) ˆg (x(t N)) u(N) + ξ(N) 18: else if t = t N + t then 19: Set EXPLORE=FALSE 20: Set DN = DN 1 {z(N), y(N)} and update GP. 21: end if 22: end if 23: end for |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository or mention code in supplementary materials. |
| Open Datasets | No | The paper references "the road vehicle model presented in Castañeda et al. (2021)" and describes "quadrotor dynamics" in Appendix A.2, but it does not provide concrete access information (e.g., a link, DOI, or repository name) for any publicly available dataset used in its experiments. It refers to models and dynamics descriptions, not a specific dataset. |
| Dataset Splits | No | The paper mentions "We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set." and "We assume to have N = 10 data points at the start of the simulation, which we employ exclusively to learn the kernel hyperparameters." This describes initial conditions and initial data for kernel learning, not specific training, validation, or test splits for a dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the numerical simulations. |
| Software Dependencies | No | The paper mentions using "conventional second-order cone program optimizers" and models involving "Gaussian process (GPs)", but it does not provide specific names or version numbers for any software dependencies or libraries used. |
| Experiment Setup | Yes | 5.1. Cruise Control: As a control barrier function, we employ h(x) = z Thv, where Th = 1.8, which aims to maintain a safe distance between the ego vehicle and the vehicle in front. The nominal controller πnom(x) used for the robust CBF-SOCP (12) is a P-controller πnom = 10(v vd), where vd = 24 corresponds to the desired velocity. We employ squared-exponential kernels to model f and g and assume to have N = 10 data points at the start of the simulation, which we employ exclusively to learn the kernel hyperparameters. 5.2. Quadrotor: The first CBF is h(x) = 10(pz Tzvz), where Tz = 0.1. The nominal controller πnom(x) used for the robust safety filter (12) corresponds to a differentially flat controller, computed as in Faessler et al. (2018), and we consider bounded thrust, with |T| 15000. Similarly to the cruise control setting, we use squared-exponential kernels and assume to have N = 10 data points at the start of the simulation to learn the kernel hyperparameters. 5.3. Comparison with Random Exploratory Control: We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set. A non-zero failure rate is expected at low sampling frequencies since too little data is collected to learn a model quickly. However, our approach nonetheless performs better than the random control input-based approach. |