reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Bayes-Optimal View on Adversarial Examples

Authors: Eitan Richardson, Yair Weiss

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments with these datasets show that CNN training consistently learns a vulnerable classiﬁer (d) even when the optimal classiﬁer is robust, while large-margin methods often succeed (e). Our results show that even when the optimal classiﬁer is robust, standard CNN training consistently learns a vulnerable classiﬁer (Figure 1d). At the same time, for exactly the same training data, RBF SVMs consistently learn a robust classiﬁer (Figure 1e).
Researcher Affiliation	Academia	Eitan Richardson EMAIL School of Computer Science and Engineering The Hebrew University of Jerusalem Jerusalem, Israel Yair Weiss EMAIL School of Computer Science and Engineering The Hebrew University of Jerusalem Jerusalem, Israel
Pseudocode	No	The paper describes methods and proofs in text and mathematical formulas, but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The datasets and models will be made publicly available after publication at https://github.com/eitanrich/bayes-optimal-adv-examples
Open Datasets	Yes	We created 12 such datasets of faces (based on the Celeb A dataset Liu et al. 2015) and 3 datasets of digits (based on MNIST)1. To verify that our ﬁndings are not limited to speciﬁc datasets (faces, digits), we trained similar models on the more complex CIFAR-10 dataset (Krizhevsky and Hinton, 2009).
Dataset Splits	No	The paper mentions using training and test sets and splitting data by binary attributes, but does not provide specific percentages, sample counts, or citations to predefined splits for reproduction. For instance, in Appendix B.1, it states: 'The training data (Celeb A, MNIST) is ﬁrst split by the desired binary attribute (e.g. Smiling / Not Smiling) and then a separate MFA model was trained independently for each subset of training samples.'
Hardware Specification	No	The paper does not specify any particular hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using Clever Hans and sklearn/libsvm libraries, but does not provide specific version numbers for these software dependencies. For example: 'We used the CNN implementation and the CW-L2 attack from the Clever Hans library (Papernot et al., 2016)' and 'We used the standard 2-class linear SVC implementation provided by sklearn/libsvm (Pedregosa et al., 2011).'
Experiment Setup	Yes	The network consists of 2D convolution layers with a kernel size of 3 and Leaky Re LU activations. We used a stride of 2 in several equally-spaced layers along the depth of the network to reduce the spatial dimension and at each such layer we doubled the width (number of channels). The network ends with a single fully-connected layer. All other hyper-parameters were left at their default values and the optimization method was Adam. We used the Clever Hans implementation with the following hyper-parameters: 500 iterations, 3 binary-searches and a learning rate of 0.01. We used the default C = 1.0 and the highest γ value that still provided a high classiﬁcation accuracy.