reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Verification-Friendly Neural Networks via Neuron Behavior Consistency

Authors: Zongxin Liu, Zhe Zhao, Fu Song, Jun Sun, Pengfei Yang, Xiaowei Huang, Lijun Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our method using the MNIST, Fashion MNIST, and CIFAR-10 datasets with various network architectures. The experimental results demonstrate that networks trained using our method are verification-friendly across different radii and architectures, whereas other tools fail to maintain verifiability as the radius increases.
Researcher Affiliation	Collaboration	1Key Laboratory of System Software (Chinese Academy of Sciences) and State Key Laboratory of Computer Science, Institute of Software, Chinese Academy of Sciences, Beijing, China 2University of Chinese Academy of Sciences, Beijing, China 3Real AI, Beijing, China 4Nanjing Institute of Software Technology, Nanjing, China 5Singapore Management University, Singapore 6College of Computer and Information Science, Software College, Southwest University, Chongqing, China 7The University of Liverpool, Liverpool, United Kingdom EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: Calculation of NBC Algorithm 2: Calculation of l NBC
Open Source Code	No	The paper does not explicitly state that source code for the methodology is openly available or provide a link to a code repository. It refers to supplementary material (Liu et al. 2024b), which is an arXiv preprint, not a code repository.
Open Datasets	Yes	Dataset. Networks are trained on three widely used datasets: MNIST (Le Cun et al. 1998), Fashion-MNIST (Xiao, Rasul, and Vollgraf 2017), and CIFAR-10 (Krizhevsky 2009).
Dataset Splits	No	For each dataset, we select k images from each of the 10 categories in the test set. For each image x and its ground truth label y, we verify the property that the network s output label remains y for input x under each perturbation ε. We set k = 100 and a timeout of 120 seconds for MNIST and Fashion-MNIST, and k = 20 with a timeout of 180 seconds for CIFAR-10.
Hardware Specification	Yes	Experiments are conducted on a server with 128 Intel Xeon Platinum 8336C CPUs, 128GB memory, and four NVIDIA GeForce RTX 4090 GPUs, running Debian GNU/Linux 10 (Buster).
Software Dependencies	Yes	We use Python 3.11.7 and Py Torch 2.1.2 for implementation.
Experiment Setup	Yes	Training. The batch size is set to 128, and using the Adam optimizer. For RQ1, networks are trained under default settings for 400 epochs. For RQ2, each network is trained for 200 epochs using the original method, followed by an additional 200 epochs combining the original method with our approach, or vice versa. For RQ3, we train a base model using the CE loss, then fine-tune it separately using RS, Madry, TRADES, and our method, ensuring that each method maintains accuracy within a specified range.