Revisiting adversarial training for the worst-performing class
Authors: Thomas Pethick, Grigorios Chrysos, Volkan Cevher
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate an improvement to 32% in the worst class accuracy on CIFAR10, and we observe consistent behavior across CIFAR100 and STL10. Our study highlights the importance of moving beyond average accuracy, which is particularly important in safetycritical applications. ... We carry out extensive experiments comparing CFOL against three strong baselines across three datasets, where we consistently observe that CFOL improves the weakest classes. ... 5 Experiments |
| Researcher Affiliation | Academia | Thomas Pethick EMAIL École Polytechnique Fédérale de Lausanne (EPFL) Grigorios G Chrysos EMAIL École Polytechnique Fédérale de Lausanne (EPFL) Volkan Cevher EMAIL École Polytechnique Fédérale de Lausanne (EPFL) |
| Pseudocode | Yes | Algorithm 1: Class focused online learning (CFOL) |
| Open Source Code | No | The paper provides pseudocode (Listing 1) and mentions a third-party library 'robustness' with a GitHub URL (Engstrom et al., 2019), but it does not explicitly state that the authors' own implementation code for CFOL is open-source or provide a direct link to their repository. |
| Open Datasets | Yes | We test on three datasets with different dimensionality, number of examples per class and number of classes. Specifically, we consider CIFAR10, CIFAR100 and STL10 (Krizhevsky et al., 2009; Coates et al., 2011) (see Appendix C.2 for further details). ... Tiny Image Net (Russakovsky et al., 2015) ... Imagenette 1 (https://github.com/fastai/imagenette) |
| Dataset Splits | No | The paper mentions using a "validation set" for early stopping and describes the total number of training examples for some datasets (e.g., "CIFAR10 includes 50,000 training examples"), but it does not specify the exact percentages or absolute counts for training, validation, and test splits needed to reproduce the experiments. For example, it doesn't state how the validation set was created from the training data or the size of the test set. |
| Hardware Specification | No | We use one GPU on an internal cluster. (Appendix C) |
| Software Dependencies | No | The paper mentions "pytorch pseudo code" in Listing 1 but does not specify any version numbers for PyTorch or other software dependencies, which are required for a reproducible description. |
| Experiment Setup | Yes | Hyper-parameters Unless otherwise noted, we use the standard adversarial training setup of a Res Net-18 network (He et al., 2016) with a learning rate τ = 0.1, momentum of 0.9, weight decay of 5 10 4, batch size of 128 with a piece-wise constant weight decay of 0.1 at epoch 100 and 150 for a total of 200 epochs according to Madry et al. (2017). For the attack we similarly adopt the common attack radius of 8/255 using 7 steps of projected gradient descent (PGD) with a step-size of 2/255 (Madry et al., 2017). For evaluation we use the stronger attack of 20 step throughout, except for Table 7 C where we show robustness against Auto Attack (Croce & Hein, 2020). |