A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization
Authors: Junwen Qiu, Xiao Li, Andre Milzarek
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, numerical experiments are performed on nonconvex classification tasks to illustrate the efficiency of the proposed approach. |
| Researcher Affiliation | Academia | Junwen Qiu EMAIL School of Data Science The Chinese University of Hong Kong, Shenzhen Guangdong, 518172, P.R. China Industrial Systems Engineering & Management National University of Singapore Singapore, 119077, Singapore Xiao Li EMAIL Andre Milzarek EMAIL School of Data Science The Chinese University of Hong Kong, Shenzhen Guangdong, 518172, P.R. China |
| Pseudocode | Yes | Algorithm 1: norm-PRR: Normal map-based proximal random reshuffling Input: Initial point z1 Rd, w1 = proxλϕ(z1) and parameters {αk}k R++, λ > 0; for k = 1, 2, . . . do Generate a permutation πk of [n]. Set zk 1 = zk and wk 1 = wk; for i = 1, 2, . . . , n do Compute zk i+1 = zk i αk( f(wk i , πk i ) + 1 λ(zk i wk i )) and wk i+1 = proxλϕ(zk i+1); end Set zk+1 = zk n+1 and wk+1 = wk n+1; end |
| Open Source Code | No | No explicit statement about releasing code for the method described in the paper or a link to a code repository was found. The license mentioned refers to the paper itself, not the source code. |
| Open Datasets | Yes | In our tests, we use the datasets CINA, MNIST, and GISETTE6. (In MNIST, we only keep two features). Datasets are available at http://www.causality.inf.ethz.ch/data and www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets. ... We now study multiclass image classification for the dataset CIFAR-10 (Krizhevsky, 2009). |
| Dataset Splits | Yes | In this experiment, we split the dataset into ntrain = 50, 000 training samples and ntest = 10, 000 test samples. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run the experiments. It only mentions general computing environments without specific details. |
| Software Dependencies | No | We use adaptive step sizes (Reduce LROn Plateau with initial step size α = 0.1 in Py Torch (Paszke et al., 2019)). While PyTorch is mentioned, a specific version number for PyTorch itself is not provided. |
| Experiment Setup | Yes | We set w0 = 10, λ = 1, and use diminishing step sizes of the form αk = α/k. Here, α takes values from the set {1, 0.1, 0.01}. ... We choose n = 5000, d = 250... We run PSGD, norm-PRR, and e-PRR with w0 = en, λ = 1 L, and constant step sizes αk = 4 Ln and αk = 0.04 Ln ... For all algorithms, we use polynomial step sizes of the form αk = α/(L + k) with α {0.01, 0.05, 0.1, 0.5, 1}; the Lipschitz constant is L = 0.8 λmax(AA )/n; the index k represents the k-th epoch. We run each algorithm for 200 epochs with w0 = 0; this process is repeated 10 times for each dataset. The parameter λ is set to λ = 1. ... We use adaptive step sizes (Reduce LROn Plateau with initial step size α = 0.1 in Py Torch (Paszke et al., 2019)) and set λ = 10 2 for norm-PRR (in both architectures). We train Res Net-18 and VGG-16 for 100 epochs with batch size 128 and run each algorithm 5 times independently. |