A New Random Reshuffling Method for Nonsmooth Nonconvex Finite-sum Optimization

Authors: Junwen Qiu, Xiao Li, Andre Milzarek

JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, numerical experiments are performed on nonconvex classification tasks to illustrate the efficiency of the proposed approach.
Researcher Affiliation Academia Junwen Qiu EMAIL School of Data Science The Chinese University of Hong Kong, Shenzhen Guangdong, 518172, P.R. China Industrial Systems Engineering & Management National University of Singapore Singapore, 119077, Singapore Xiao Li EMAIL Andre Milzarek EMAIL School of Data Science The Chinese University of Hong Kong, Shenzhen Guangdong, 518172, P.R. China
Pseudocode Yes Algorithm 1: norm-PRR: Normal map-based proximal random reshuffling Input: Initial point z1 Rd, w1 = proxλϕ(z1) and parameters {αk}k R++, λ > 0; for k = 1, 2, . . . do Generate a permutation πk of [n]. Set zk 1 = zk and wk 1 = wk; for i = 1, 2, . . . , n do Compute zk i+1 = zk i αk( f(wk i , πk i ) + 1 λ(zk i wk i )) and wk i+1 = proxλϕ(zk i+1); end Set zk+1 = zk n+1 and wk+1 = wk n+1; end
Open Source Code No No explicit statement about releasing code for the method described in the paper or a link to a code repository was found. The license mentioned refers to the paper itself, not the source code.
Open Datasets Yes In our tests, we use the datasets CINA, MNIST, and GISETTE6. (In MNIST, we only keep two features). Datasets are available at http://www.causality.inf.ethz.ch/data and www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets. ... We now study multiclass image classification for the dataset CIFAR-10 (Krizhevsky, 2009).
Dataset Splits Yes In this experiment, we split the dataset into ntrain = 50, 000 training samples and ntest = 10, 000 test samples.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory details) used to run the experiments. It only mentions general computing environments without specific details.
Software Dependencies No We use adaptive step sizes (Reduce LROn Plateau with initial step size α = 0.1 in Py Torch (Paszke et al., 2019)). While PyTorch is mentioned, a specific version number for PyTorch itself is not provided.
Experiment Setup Yes We set w0 = 10, λ = 1, and use diminishing step sizes of the form αk = α/k. Here, α takes values from the set {1, 0.1, 0.01}. ... We choose n = 5000, d = 250... We run PSGD, norm-PRR, and e-PRR with w0 = en, λ = 1 L, and constant step sizes αk = 4 Ln and αk = 0.04 Ln ... For all algorithms, we use polynomial step sizes of the form αk = α/(L + k) with α {0.01, 0.05, 0.1, 0.5, 1}; the Lipschitz constant is L = 0.8 λmax(AA )/n; the index k represents the k-th epoch. We run each algorithm for 200 epochs with w0 = 0; this process is repeated 10 times for each dataset. The parameter λ is set to λ = 1. ... We use adaptive step sizes (Reduce LROn Plateau with initial step size α = 0.1 in Py Torch (Paszke et al., 2019)) and set λ = 10 2 for norm-PRR (in both architectures). We train Res Net-18 and VGG-16 for 100 epochs with batch size 128 and run each algorithm 5 times independently.