reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scaling provable adversarial defenses

Authors: Eric Wong, Frank Schmidt, Jan Hendrik Metzen, J. Zico Kolter

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On both MNIST and CIFAR data sets, we train classiﬁers that improve substantially on the state of the art in provable robust adversarial error bounds: from 5.8% to 3.1% on MNIST (with ℓ perturbations of ϵ = 0.1), and from 80% to 36.4% on CIFAR (with ℓ perturbations of ϵ = 2/255).
Researcher Affiliation	Collaboration	Eric Wong Machine Learning Department Carnegie Mellon University Pittsburgh, PA 15213 EMAIL Frank R. Schmidt Bosch Center for Artiﬁcial Intelligence Renningen, Germany EMAIL Jan Hendrik Metzen Bosch Center for Artiﬁcial Intelligence Renningen, Germany EMAIL J. Zico Kolter Computer Science Department Carnegie Mellon University and Bosch Center for Artiﬁcial Intelligence Pittsburgh, PA 15213 EMAIL
Pseudocode	Yes	Algorithm 1 Estimating ν1 1 and P j I ℓij[νij]+
Open Source Code	Yes	Code for all experiments in the paper is available at https://github.com/locuslab/convex_adversarial/.
Open Datasets	Yes	We evaluate the techniques in this paper on two main datasets: MNIST digit classiﬁcation [Le Cun et al., 1998] and CIFAR10 image classiﬁcation [Krizhevsky, 2009].
Dataset Splits	No	The paper uses MNIST and CIFAR10 datasets but does not explicitly provide percentages or counts for train/validation/test splits, nor does it reference a specific, predefined split that includes a validation set. It discusses training and testing errors, but a separate validation set split is not detailed.
Hardware Specification	Yes	Each training epoch with 10 random projections takes less than a minute on a single Ge Force GTX 1080 Ti graphics card, while using less than 700MB of memory
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The ϵ value for training is scheduled from 0.01 to 0.1 over the ﬁrst 20 epochs.