Robustness between the worst and average case
Authors: Leslie Rice, Anna Bair, Huan Zhang, J. Zico Kolter
NeurIPS 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our approach provides substantially better estimates than simple random sampling of the actual intermediate-q robustness of standard, data-augmented, and adversarially-trained classifiers, illustrating a clear tradeoff between classifiers that optimize different metrics. |
| Researcher Affiliation | Collaboration | Leslie Rice Department of Computer Science Carnegie Mellon University Pittsburgh, PA EMAIL Anna Bair Department of Machine Learning Carnegie Mellon University Pittsburgh, PA EMAIL Huan Zhang Department of Computer Science Carnegie Mellon University Pittsburgh, PA EMAIL J. Zico Kolter Department of Computer Science Carnegie Mellon University & Bosch Center for Artificial Intelligence Pittsburgh, PA EMAIL |
| Pseudocode | Yes | Algorithm 1 Evaluating the intermediate-q robustness of a neural network function h using path sampling estimation with m MCMC samples with x, y D for some norm q. |
| Open Source Code | Yes | Code for reproducing experiments can be found at https://github.com/locuslab/intermediate_robustness. |
| Open Datasets | Yes | All of our experiments are either run on the MNIST dataset [Le Cun et al., 1998] or the CIFAR-10 dataset [Krizhevsky et al., 2009]. |
| Dataset Splits | No | The paper mentions using MNIST and CIFAR-10 datasets for experiments but does not provide specific details on training, validation, and test splits (e.g., percentages, sample counts, or explicit mention of a validation set for hyperparameter tuning). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory configurations. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers, such as Python version, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used in the implementation. |
| Experiment Setup | Yes | On MNIST, ˆZMC is computed with m = 2000, ˆZPS+HMC with m = 100, L = 20, and Adv. loss corresponds to PGD with 100 iterations. On CIFAR-10, ˆZMC is computed with m = 500, ˆZPS+HMC with m = 50, L = 10, and Adv. loss corresponds to PGD with 50 iterations at 10 restarts. For the MC estimate computed during training, we use m = 50 samples, whereas for the PS+HMC estimate we use m = 25 samples with L = 2 leapfrog steps. |