Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization

Authors: Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, XIAOSHAN GAO

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, through extensive experiments, we demonstrate that our method significantly mitigates robust overfitting and enhances robustness within the framework of WDRO. ... We conduct extensive evaluations on benchmark datasets and our results show that the SR-WDRO approach effectively mitigates robust overfitting and outperforms other existing robust methods in terms of adversarial robustness. ... In this section, we investigate the efficacy of our SR-WDRO training through extensive experiments on the CIFAR-10 and CIFAR-100 datasets. ... Figure 2: Comparison of SR-WDRO against other robust training methods on CIFAR10 (ε = 8/255). Left: Robust test accuracy. Right: Robust test loss. Our method (green) demonstrates competitive performance in both metrics, particularly in mitigating robust overfitting and higher robust test accuracy.
Researcher Affiliation Academia Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, Xiao-Shan Gao State Key Laboratory of Mathematical Sciences Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China
Pseudocode Yes Algorithm 1 Statistically Robust WDRO Training Input: Training set Sn, number of iterations T, batch size N, learning rate ηθ, ηλ, adversary parameters: attack budget ε, steps K, step size η. Output: Robust model θ.
Open Source Code Yes The implementation of our approach is publicly available at the following Git Hub repository: https://github.com/hong-xian/SR-WDRO.
Open Datasets Yes In this section, we investigate the efficacy of our SR-WDRO training through extensive experiments on the CIFAR-10 and CIFAR-100 datasets.
Dataset Splits No The paper uses standard benchmark datasets (CIFAR-10 and CIFAR-100) and mentions 'training set Sn' and 'test data' throughout the experimental section. However, it does not explicitly provide specific percentages (e.g., 80/10/10) or absolute sample counts for how these datasets were split into training, validation, or test sets for their experiments.
Hardware Specification Yes Table 8: Training time per epoch and total training time for CIFAR-10 on a single NVIDIA A800 GPU.
Software Dependencies No The paper mentions using 'SGD as the optimizer' and 'Res Net-18' model architecture. It also refers to specific attack methods like 'PGD-AT' and 'Auto-Attack'. However, it does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow, CUDA) or other key components used in the implementation, beyond general mentions of optimizers or model types.
Experiment Setup Yes We train Res Net-18 (He et al., 2016) with 200 epochs, and use SGD as the optimizer with learning rate decay by 0.1 at the epoch 100 and 150. For all methods, we implement adversarial training with {k = 10, ε = 8/255, η = 2/255} where k is the iteration number, ε is the attack budget and η is the step size. We use different attacks to evaluate the defense methods, including: 1) PGD-10 with {k = 10, ε = 8/255, η = ε/4}, 2) PGD-200 with {k = 200, ε = 8/255, η = ε/4}, 3) Auto-Attack (AA) (Croce & Hein, 2020) with ε = 8/255. The l∞-norm is used for all measures. Unless otherwise specified, we set γ = 0.1 to its default value. ... We use the SGD optimizer with momentum 0.9, weight decay 5e-4. The starting learning rate is 0.1 and reduce the learning rate ( × 0.1) at epoch {100, 150}. We train with 200 epochs.