reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Safe Control via On-the-Fly Bandit Exploration

Authors: Alexandre Capone, Ryan Kazuo Cosner, Aaron Ames, Sandra Hirche

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now showcase how our approach performs using two numerical simulations: a cruise control system and a quadrotor with ground dynamics. We additionally perform experiments that aim to answer the following questions: How does our method compare with random exploration during infeasibility? How does the choice of sampling time t affect safety? We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set. We report how often each method fails, i.e., leads to a positive value for the CBF during the simulation. The average number of failures for both settings is shown in Figure 5.
Researcher Affiliation	Academia	1Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA 2 Department of Mechanical and Civil Engineering , California Institute of Technology, Pasadena, CA, USA 3TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany. Correspondence to: Alexandre Capone <EMAIL>.
Pseudocode	Yes	Algorithm 1 Safe Control via On-the-Fly Bandit Exploration Input: Sampling time t, GP prior, CBF h, class-KL function α 1: Set EXPLORE = FALSE 2: for t [0, ) do 3: if EXPLORE==FALSE then 4: if maxu U LCBN(x, u) > α(h(x)) + ϵ 2 then 5: Solve (12) and apply πN,safe(x) 6: else 7: Set EXPLORE=TRUE 8: Set N = N + 1. 9: Set t N = t. 10: Set x(N) = x. 11: Compute u(N) by solving (14). 12: end if 13: end if 14: if EXPLORE==TRUE then 15: if t < t N + t then 16: Apply a locally Lipschitz controller π with π(x(N)) = u(N). 17: Collect noisy measurement y(N) = x(t N) ˆf (x(t N)) ˆg (x(t N)) u(N) + ξ(N) 18: else if t = t N + t then 19: Set EXPLORE=FALSE 20: Set DN = DN 1 {z(N), y(N)} and update GP. 21: end if 22: end if 23: end for
Open Source Code	No	The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper references "the road vehicle model presented in Castañeda et al. (2021)" and describes "quadrotor dynamics" in Appendix A.2, but it does not provide concrete access information (e.g., a link, DOI, or repository name) for any publicly available dataset used in its experiments. It refers to models and dynamics descriptions, not a specific dataset.
Dataset Splits	No	The paper mentions "We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set." and "We assume to have N = 10 data points at the start of the simulation, which we employ exclusively to learn the kernel hyperparameters." This describes initial conditions and initial data for kernel learning, not specific training, validation, or test splits for a dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, memory) used to run the numerical simulations.
Software Dependencies	No	The paper mentions using "conventional second-order cone program optimizers" and models involving "Gaussian process (GPs)", but it does not provide specific names or version numbers for any software dependencies or libraries used.
Experiment Setup	Yes	5.1. Cruise Control: As a control barrier function, we employ h(x) = z Thv, where Th = 1.8, which aims to maintain a safe distance between the ego vehicle and the vehicle in front. The nominal controller πnom(x) used for the robust CBF-SOCP (12) is a P-controller πnom = 10(v vd), where vd = 24 corresponds to the desired velocity. We employ squared-exponential kernels to model f and g and assume to have N = 10 data points at the start of the simulation, which we employ exclusively to learn the kernel hyperparameters. 5.2. Quadrotor: The first CBF is h(x) = 10(pz Tzvz), where Tz = 0.1. The nominal controller πnom(x) used for the robust safety filter (12) corresponds to a differentially flat controller, computed as in Faessler et al. (2018), and we consider bounded thrust, with \|T\| 15000. Similarly to the cruise control setting, we use squared-exponential kernels and assume to have N = 10 data points at the start of the simulation to learn the kernel hyperparameters. 5.3. Comparison with Random Exploratory Control: We perform 100 simulations with different initial conditions, uniformly sampled from a region within the safe set. A non-zero failure rate is expected at low sampling frequencies since too little data is collected to learn a model quickly. However, our approach nonetheless performs better than the random control input-based approach.