Revisiting Large-Scale Non-convex Distributionally Robust Optimization

Authors: Qi Zhang, Yi Zhou, Simon Khan, Ashley Prater-Bennette, Lixin Shen, Shaofeng Zou

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our theoretical results and insights are further verified numerically on a number of tasks, and our algorithms outperform the existing DRO method (Jin et al., 2021). [...] In this section, we conduct numerical studies on a set of regression tasks (Chen et al., 2023) on the life expectancy data 1. This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing. [...] In Figure 2, we provide the training curves with fine-tuned learning rate for SGD, Normalized SPIDER (Chen et al., 2023), Normalized-SGD with momentum (Jin et al., 2021) and our proposed D-SGD-C and D-SPIDER-C methods. Our D-SPIDER-C has similar performance compared with Normalized-SPIDER and both our two algorithms outperform the SGD and Normalized-SGD with momentum methods.
Researcher Affiliation Academia School of Electrical, Computer and Energy Engineering, Arizona State University1 Department of Computer Science and Engineering, Texas A&M University2 Information Directorate, Air Force Research Laboratory3 Department of Mathematics, Syracuse University4
Pseudocode Yes Algorithm 1 D-GD [...] Algorithm 2 D-SGD-C [...] Algorithm 3 D-Spider-C [...] Algorithm 4 D-SGD-M
Open Source Code No The paper does not explicitly state that source code is provided, nor does it include a link to a code repository.
Open Datasets Yes In this section, we conduct numerical studies on a set of regression tasks (Chen et al., 2023) on the life expectancy data 1. This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing. [...] In this part, we conduct experiments on the famous CIFAR-10 dataset (Alex, 2009)
Dataset Splits Yes This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing.
Hardware Specification No The paper does not provide specific details about the hardware used for running its experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes The non-convex original loss function is set as ℓ(x, (zi, yi)) = 1 2(yi z i x)2+0.1 P34 j=1 ln(1+|xj|), where x = (x1, x2, ..., x34) is the trainable parameter. For the DRO model, λ is set to 0.01, and the initial value η0 is set to 0.1. [...] The iteration number is set to 50. For existing methods, we follow the fine-tuned learning rates in (Chen et al., 2023), where the step size βt = 10 4 for GD, βt = 0.2 for normalized GD and βt = 0.3 min 1 10, 1 x,ηL(xt,ηt) . For our D-GD method, we set αt = βt = 10 4 and for our D-GD-C method, we set αt = 10 4 and βt = 0.35 min 1 2000, 1 x L(xt,ηt+1) . [...] In our stochastic setting, we run the experiments for 5000 iterations. We set the mini-batch size to 50. For SGD, the step size is βt = 2 10 4. For the normalized SGD with momentum method, the momentum coefficient is set to 10 4 and the step size is set to 8 10 3. For the normalized SPIDER method, we have that step size βt = 4 10 3 and epoch size q = 20. For our D-SGD-C, we set αt = 8 10 5 and βt = 0.05 min( 1 100, 1 vt ). For our D-SPIDER-C, we have that αt = 8 10 5 and βt = 7.5 10 3 min 2.5, 1 vt .