reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable Robust Overfitting Mitigation in Wasserstein Distributionally Robust Optimization

Authors: Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, XIAOSHAN GAO

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, through extensive experiments, we demonstrate that our method significantly mitigates robust overfitting and enhances robustness within the framework of WDRO. ... We conduct extensive evaluations on benchmark datasets and our results show that the SR-WDRO approach effectively mitigates robust overfitting and outperforms other existing robust methods in terms of adversarial robustness. ... In this section, we investigate the efficacy of our SR-WDRO training through extensive experiments on the CIFAR-10 and CIFAR-100 datasets. ... Figure 2: Comparison of SR-WDRO against other robust training methods on CIFAR10 (ε = 8/255). Left: Robust test accuracy. Right: Robust test loss. Our method (green) demonstrates competitive performance in both metrics, particularly in mitigating robust overfitting and higher robust test accuracy.
Researcher Affiliation	Academia	Shuang Liu, Yihan Wang, Yifan Zhu, Yibo Miao, Xiao-Shan Gao State Key Laboratory of Mathematical Sciences Academy of Mathematics and Systems Science, Chinese Academy of Sciences Beijing 100190, China University of Chinese Academy of Sciences, Beijing 100049, China
Pseudocode	Yes	Algorithm 1 Statistically Robust WDRO Training Input: Training set Sn, number of iterations T, batch size N, learning rate ηθ, ηλ, adversary parameters: attack budget ε, steps K, step size η. Output: Robust model θ.
Open Source Code	Yes	The implementation of our approach is publicly available at the following Git Hub repository: https://github.com/hong-xian/SR-WDRO.
Open Datasets	Yes	In this section, we investigate the efficacy of our SR-WDRO training through extensive experiments on the CIFAR-10 and CIFAR-100 datasets.
Dataset Splits	No	The paper uses standard benchmark datasets (CIFAR-10 and CIFAR-100) and mentions 'training set Sn' and 'test data' throughout the experimental section. However, it does not explicitly provide specific percentages (e.g., 80/10/10) or absolute sample counts for how these datasets were split into training, validation, or test sets for their experiments.
Hardware Specification	Yes	Table 8: Training time per epoch and total training time for CIFAR-10 on a single NVIDIA A800 GPU.
Software Dependencies	No	The paper mentions using 'SGD as the optimizer' and 'Res Net-18' model architecture. It also refers to specific attack methods like 'PGD-AT' and 'Auto-Attack'. However, it does not provide specific version numbers for any software libraries (e.g., Python, PyTorch, TensorFlow, CUDA) or other key components used in the implementation, beyond general mentions of optimizers or model types.
Experiment Setup	Yes	We train Res Net-18 (He et al., 2016) with 200 epochs, and use SGD as the optimizer with learning rate decay by 0.1 at the epoch 100 and 150. For all methods, we implement adversarial training with {k = 10, ε = 8/255, η = 2/255} where k is the iteration number, ε is the attack budget and η is the step size. We use different attacks to evaluate the defense methods, including: 1) PGD-10 with {k = 10, ε = 8/255, η = ε/4}, 2) PGD-200 with {k = 200, ε = 8/255, η = ε/4}, 3) Auto-Attack (AA) (Croce & Hein, 2020) with ε = 8/255. The l∞-norm is used for all measures. Unless otherwise specified, we set γ = 0.1 to its default value. ... We use the SGD optimizer with momentum 0.9, weight decay 5e-4. The starting learning rate is 0.1 and reduce the learning rate ( × 0.1) at epoch {100, 150}. We train with 200 epochs.