reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks

Authors: Xiyue Zhang, Benjie Wang, Marta Kwiatkowska, Huan Zhang

JMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on a range of tasks, demonstrating signiﬁcant improvement in eﬃciency and scalability to high-input-dimensional image classiﬁcation tasks compared to state-of-the-art techniques. Further, we showcase the application to quantitative veriﬁcation and robustness analysis, presenting a sound and complete algorithm for the former and providing sound quantitative results for the latter. Keywords: preimage approximation, abstraction and reﬁnement, linear relaxation, formal veriﬁcation, neural network
Researcher Affiliation	Academia	Xiyue Zhang EMAIL Benjie Wang EMAIL Marta Kwiatkowska EMAIL Department of Computer Science University of Oxford Oxford, OX1 3QD, UK Huan Zhang EMAIL Department of Electrical and Computer Engineering University of Illinois Urbana Champaign Urbana, IL 61801, USA
Pseudocode	Yes	Algorithm 1: Gen Approx Input: List of subregions C, Output set O, Number of samples N, Boolean Under Output: List of polytopes T
Open Source Code	Yes	6. a publicly-available software implementation of our preimage approximation framework (Zhang et al., 2025).
Open Datasets	Yes	We evaluate PREMAP on a benchmark of reinforcement learning and image classiﬁcation tasks. Besides the vehicle parking task of Ayala et al. (2011) shown in the running example, we consider the following tasks: (1) aircraft collision avoidance system (VCAS) from Julian and Kochenderfer (2019) with 9 feed-forward neural networks (FNNs); (2) neural network controllers from M uller et al. (2022) for three reinforcement learning tasks (Cartpole, Lunarlander, and Dubinsrejoin) as in Brockman et al. (2016); and (3) the neural network for MNIST classiﬁcation from VNN-COMP 2022 (Brix and Shi, 2022).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It describes input regions used for the analysis of pre-trained models, but not the splits for training the models or for its own experimental setup in terms of data partitioning.
Hardware Specification	No	The paper mentions that the method uses a "GPU-friendly manner" and leverages "GPU parallelism" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for the experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python, PyTorch, specific solvers or libraries with their versions) that would be needed to replicate the experiments.
Experiment Setup	Yes	To evaluate the quality of the preimage approximation, we deﬁne the coverage ratio to be the ratio of volume covered by the approximation to the volume of the exact preimage, i.e., cov(T , f 1 C (O)) := vol(T ) vol(f 1 C (O)). Note that this is a normalised mea- sure for assessing the quality of the approximation, as used in Algorithm 3 when comparing with target coverage proportion p for termination of the reﬁnement loop. In practice, we use Monte Carlo estimation to compute vol(f 1 C (O)) as c vol(f 1 C (O)) = vol(C) 1 N PN i=1 1f(xi) O, where x1, ...x N are samples from C. In Algorithm 2, the target volume (stopping criterion) is set as v = r c vol(f 1 C (O), where r is the target coverage ratio. Thus, in our experimental results, the coverage ratio reported will be greater than (resp. less than) the target coverage ratio for an under-approximation (resp. over-approximation), unless the maximum number of iterations is reached.