reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Neural Control and Certificate Repair via Runtime Monitoring

Authors: Emily Yu, Đorđe Žikelić, Thomas A. Henzinger

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach empirically by using it to repair and to boost the safety rate of neural network policies learned by a state-of-the-art method for learning-based control on two autonomous system control tasks.
Researcher Affiliation	Academia	Emily Yu1, Đorđe Žikelić2, Thomas A. Henzinger1 1Institute of Science and Technology Austria, Klosterneuburg, Austria 2Singapore Management University, Singapore, Singapore EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Policy Repair. Monitoring-based neural network policy and certificate repair for safety properties.
Open Source Code	No	We implemented a prototype of our method in Python 3.6 as an extension of the SABLAS codebase (Qin, Sun, and Fan 2022). The paper mentions extending a codebase, but does not state that their specific code for this work is open-source or provide a link.
Open Datasets	Yes	We consider the benchmarks from the SABLAS codebase, together with policies and certificate functions learned by the SABLAS method for these benchmarks. We then apply our method to these control policies and certificate functions in order to experimentally evaluate the ability of our method to detect incorrect behaviors and to repair them. The first benchmark concerns a parcel delivery drone flying in a city (called the active drone)... The second benchmark Ship Env concerns a ship moving in a river among 32 other ships.
Dataset Splits	No	For each monitored execution, N = 2000 and N = 1200 states are observed for Ship Env and Drone Env, respectively, spaced out at time intervals of t = 0.1s. In each environment, the states of the eight nearest obstacles (i.e. drones or ships) are given as observations to the policy, as well as the certificate functions. The paper describes parameters for data observation during monitoring (N, t) and counts of initial states or executions (D), but not explicit training/validation/test splits of a static dataset.
Hardware Specification	No	We implemented a prototype of our method in Python 3.6 as an extension of the SABLAS codebase (Qin, Sun, and Fan 2022). The paper does not specify any hardware used for running experiments, such as CPU, GPU models, or cloud resources.
Software Dependencies	Yes	We implemented a prototype of our method in Python 3.6 as an extension of the SABLAS codebase (Qin, Sun, and Fan 2022).
Experiment Setup	Yes	For each monitored execution, N = 2000 and N = 1200 states are observed for Ship Env and Drone Env, respectively, spaced out at time intervals of t = 0.1s. In addition, if the Pred PM monitor is to be used, the algorithm also takes as input the three thresholds ξU, ξS, and ξN. ... We conducted an analogous experiment with a certificate function consisting both of a barrier and a Lyapunov function and again observed significant improvements upon repair. Further details, including results on additional threshold configurations of Pred PM beyond the three configurations discussed in Table 1, can be found in the extended version (Yu, Zikelic, and Henzinger 2024).