PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks
Authors: Xiyue Zhang, Benjie Wang, Marta Kwiatkowska, Huan Zhang
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks compared to state-of-the-art techniques. Further, we showcase the application to quantitative verification and robustness analysis, presenting a sound and complete algorithm for the former and providing sound quantitative results for the latter. Keywords: preimage approximation, abstraction and refinement, linear relaxation, formal verification, neural network |
| Researcher Affiliation | Academia | Xiyue Zhang EMAIL Benjie Wang EMAIL Marta Kwiatkowska EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QD, UK Huan Zhang EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Urbana, IL 61801, USA |
| Pseudocode | Yes | Algorithm 1: Gen Approx Input: List of subregions C, Output set O, Number of samples N, Boolean Under Output: List of polytopes T |
| Open Source Code | Yes | 6. a publicly-available software implementation of our preimage approximation framework (Zhang et al., 2025). |
| Open Datasets | Yes | We evaluate PREMAP on a benchmark of reinforcement learning and image classification tasks. Besides the vehicle parking task of Ayala et al. (2011) shown in the running example, we consider the following tasks: (1) aircraft collision avoidance system (VCAS) from Julian and Kochenderfer (2019) with 9 feed-forward neural networks (FNNs); (2) neural network controllers from M uller et al. (2022) for three reinforcement learning tasks (Cartpole, Lunarlander, and Dubinsrejoin) as in Brockman et al. (2016); and (3) the neural network for MNIST classification from VNN-COMP 2022 (Brix and Shi, 2022). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It describes input regions used for the *analysis* of pre-trained models, but not the splits for *training* the models or for its own experimental setup in terms of data partitioning. |
| Hardware Specification | No | The paper mentions that the method uses a "GPU-friendly manner" and leverages "GPU parallelism" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, specific solvers or libraries with their versions) that would be needed to replicate the experiments. |
| Experiment Setup | Yes | To evaluate the quality of the preimage approximation, we define the coverage ratio to be the ratio of volume covered by the approximation to the volume of the exact preimage, i.e., cov(T , f 1 C (O)) := vol(T ) vol(f 1 C (O)). Note that this is a normalised mea- sure for assessing the quality of the approximation, as used in Algorithm 3 when comparing with target coverage proportion p for termination of the refinement loop. In practice, we use Monte Carlo estimation to compute vol(f 1 C (O)) as c vol(f 1 C (O)) = vol(C) 1 N PN i=1 1f(xi) O, where x1, ...x N are samples from C. In Algorithm 2, the target volume (stopping criterion) is set as v = r c vol(f 1 C (O), where r is the target coverage ratio. Thus, in our experimental results, the coverage ratio reported will be greater than (resp. less than) the target coverage ratio for an under-approximation (resp. over-approximation), unless the maximum number of iterations is reached. |