Revisiting Large-Scale Non-convex Distributionally Robust Optimization
Authors: Qi Zhang, Yi Zhou, Simon Khan, Ashley Prater-Bennette, Lixin Shen, Shaofeng Zou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical results and insights are further verified numerically on a number of tasks, and our algorithms outperform the existing DRO method (Jin et al., 2021). [...] In this section, we conduct numerical studies on a set of regression tasks (Chen et al., 2023) on the life expectancy data 1. This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing. [...] In Figure 2, we provide the training curves with fine-tuned learning rate for SGD, Normalized SPIDER (Chen et al., 2023), Normalized-SGD with momentum (Jin et al., 2021) and our proposed D-SGD-C and D-SPIDER-C methods. Our D-SPIDER-C has similar performance compared with Normalized-SPIDER and both our two algorithms outperform the SGD and Normalized-SGD with momentum methods. |
| Researcher Affiliation | Academia | School of Electrical, Computer and Energy Engineering, Arizona State University1 Department of Computer Science and Engineering, Texas A&M University2 Information Directorate, Air Force Research Laboratory3 Department of Mathematics, Syracuse University4 |
| Pseudocode | Yes | Algorithm 1 D-GD [...] Algorithm 2 D-SGD-C [...] Algorithm 3 D-Spider-C [...] Algorithm 4 D-SGD-M |
| Open Source Code | No | The paper does not explicitly state that source code is provided, nor does it include a link to a code repository. |
| Open Datasets | Yes | In this section, we conduct numerical studies on a set of regression tasks (Chen et al., 2023) on the life expectancy data 1. This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing. [...] In this part, we conduct experiments on the famous CIFAR-10 dataset (Alex, 2009) |
| Dataset Splits | Yes | This dataset consists of N = 2413 samples, where we select the first 2000 samples for training and the rest samples for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running its experiments. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | The non-convex original loss function is set as ℓ(x, (zi, yi)) = 1 2(yi z i x)2+0.1 P34 j=1 ln(1+|xj|), where x = (x1, x2, ..., x34) is the trainable parameter. For the DRO model, λ is set to 0.01, and the initial value η0 is set to 0.1. [...] The iteration number is set to 50. For existing methods, we follow the fine-tuned learning rates in (Chen et al., 2023), where the step size βt = 10 4 for GD, βt = 0.2 for normalized GD and βt = 0.3 min 1 10, 1 x,ηL(xt,ηt) . For our D-GD method, we set αt = βt = 10 4 and for our D-GD-C method, we set αt = 10 4 and βt = 0.35 min 1 2000, 1 x L(xt,ηt+1) . [...] In our stochastic setting, we run the experiments for 5000 iterations. We set the mini-batch size to 50. For SGD, the step size is βt = 2 10 4. For the normalized SGD with momentum method, the momentum coefficient is set to 10 4 and the step size is set to 8 10 3. For the normalized SPIDER method, we have that step size βt = 4 10 3 and epoch size q = 20. For our D-SGD-C, we set αt = 8 10 5 and βt = 0.05 min( 1 100, 1 vt ). For our D-SPIDER-C, we have that αt = 8 10 5 and βt = 7.5 10 3 min 2.5, 1 vt . |