reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ExpProof : Operationalizing Explanations for Confidential Models with ZKPs

Authors: Chhavi Yadav, Evan Laufer, Dan Boneh, Kamalika Chaudhuri

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments. We evaluate Exp Proofon fully connected Re LU Neural Networks and Random Forests for three standard datasets on an Ubuntu Server with 64 CPUs of x86 64 architecture and 256 GB of memory without any explicit parallelization. Our results show that Exp Proofis computationally feasible, with a maximum proof generation time of 1.5 minutes, verification time of 0.12 seconds and proof size of 13KB for NNs and standard LIME.
Researcher Affiliation	Academia	1UC San Diego 2Stanford University. Correspondence to: Chhavi Yadav <EMAIL>, Evan Monroe Laufer <EMAIL>.
Pseudocode	Yes	Algorithm 1 LIME (Ribeiro et al., 2016), Algorithm 2 STANDARD LIME VARIANTS, Algorithm 3 BORDERLIME, Algorithm 4 FIND CLOSEST POINT WITH OPP LABEL, Algorithm 5 Exp Proof: Provable Explanation for Confidential Models, Algorithm 6 ZK LIME, Algorithm 7 ZK CHECK POSEIDON, Algorithm 8 ZK FIND OPP POINT, Algorithm 9 ZK LASSO, Algorithm 10 ZK TOP K, Algorithm 11 ZK EXPONENTIAL KERNEL, Algorithm 12 ZK UNIFORM SAMPLE, Algorithm 13 ZK GAUSSIAN SAMPLE.
Open Source Code	Yes	Our code is publicly available at : https:// github.com/emlaufer/Exp Proof.
Open Datasets	Yes	Datasets & Models. We use three standard fairness benchmarks for experimentation : Adult (Becker & Kohavi, 1996), Credit (Yeh, 2016) and German Credit (Hofmann, 1994).
Dataset Splits	No	The paper mentions evaluating results on '50 different input points sampled randomly from the test set' but does not provide explicit overall training, validation, and test split percentages or counts for the datasets used to train the models.
Hardware Specification	Yes	Our ZKP experiments are run on an Ubuntu server with 64 CPUs of x86 64 architecture and 256 GB of memory, without any explicit parallelization while ezkl automatically does multithreading on all the available cores, we do not use GPUs, do not modify ezkl to do more parallelization and do not do any of the steps in ZK LIME (Alg. 6) in parallel by ourselves.
Software Dependencies	Yes	We code Exp Proofwith different variants of LIME in the ezkl library (Konduit, 2024) (Version 18.1.1) which uses Halo2 (Zcash Foundation, 2023) as its underlying proof system in the Rust programming language, resulting in 3.7k lines of code.
Experiment Setup	Yes	Our neural networks are 2-layer fully connected Re LU activated networks with 16 hidden units in each layer, trained using Stochastic Gradient Descent in Py Torch (Paszke et al., 2019) with a learning rate of 0.001 for 400 epochs. ... Our random forests are trained using Scikit-Learn (Pedregosa et al., 2011) with 5-6 decision trees in each forest. ... We use the LIME library for experimentation and run the different variants of LIME with number of neighboring samples n = 300 and length of explanation K = 5. Based on the sampling type, we either sample randomly from a hypercube with half-edge length as 0.2 or a Gaussian distribution centered at the input point with a standard deviation of 0.2. Based on the kernel type we either do not use a kernel or use the exponential kernel with a bandwidth parameter as #features 0.75 (default value in the LIME library). Rest of the parameters also keep the default values of the LIME library. Our results are averaged over 50 different input points sampled randomly from the test set. The duality gap constant, ϵ is set to 0.001.