reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adversarial Robustness with Semi-Infinite Constrained Learning

Authors: Alexander Robey, Luiz Chamon, George J. Pappas, Hamed Hassani, Alejandro Ribeiro

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we show that our approach can mitigate the trade-off between nominal and robust performance, yielding state-of-the-art results on MNIST and CIFAR-10. Our code is available at: https://github.com/arobey1/advbench. In this section, we include an empirical evaluation of the DALE algorithm. In particular, we consider two standard datasets: MNIST and CIFAR-10. For MNIST, we train four-layer CNNs and set = {δ : δ 0.3}; for CIFAR-10, we train Res Net-18 models and set = {δ : δ 8/255}. All hyperparameters and performance metrics are chosen with respect to the robust accuracy of a PGD adversary evaluated on a small hold-out validation set.
Researcher Affiliation	Academia	Alexander Robey University of Pennsylvania EMAIL Luiz F. O. Chamon University of California, Berkeley EMAIL George J. Pappas University of Pennsylvania EMAIL Hamed Hassani University of Pennsylvania EMAIL Alejandro Ribeiro University of Pennsylvania EMAIL
Pseudocode	Yes	Algorithm 1 Semi-Infinite Dual Adversarial Learning (DALE)
Open Source Code	Yes	Our code is available at: https://github.com/arobey1/advbench.
Open Datasets	Yes	In particular, we consider two standard datasets: MNIST and CIFAR-10. [102] The MNIST database of handwritten digits Home Page. http://yann.lecun.com/exdb/mnist/. [104] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
Dataset Splits	No	No specific percentages or sample counts for training, validation, or test splits are provided. The paper only mentions using a 'small hold-out validation set' for hyperparameter tuning.
Hardware Specification	No	No specific hardware details (e.g., GPU models, CPU types, or memory) used for running experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'TensorFlow 2.x') are provided. The paper mentions optimizers and model architectures but not the software used to implement them with versions.
Experiment Setup	Yes	For MNIST, we train four-layer CNNs and set = {δ : δ 0.3}; for CIFAR-10, we train Res Net-18 models and set = {δ : δ 8/255}. All hyperparameters and performance metrics are chosen with respect to the robust accuracy of a PGD adversary evaluated on a small hold-out validation set. We used a batch size of 128 for both datasets and optimized the parameters using SGD with momentum (0.9) and a weight decay of 5e-4. For MNIST, we trained our model for 100 epochs with an initial learning rate of 0.01 that decayed by 0.1 every 30 epochs. For CIFAR-10, we trained our model for 200 epochs with an initial learning rate of 0.1 that decayed by 0.1 every 75 epochs.