Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness

Authors: Qi He, Peiran Yu, Ziyi Chen, Heng Huang

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted numerical experiments to demonstrate that the shuffling-type gradient algorithm converges faster than SGD on two important non-Lipschitz smooth applications. [...] 5. Numerical Experiments: We compare reshuffling gradient algorithm (Algorithm 1) with SGD on multiple ℓ-smooth optimization problems to prove its effectiveness. Experiments are conducted with different shuffling schemes, on convex, strongly convex and nonconvex objective functions, including synthetic functions, phase retrieval, distributionally robust optimization (DRO) and image classification. [...] Figure 1: Experimental Results on Convex (up) and Strongly-convex (down) Objective Functions. [...] Figure 3: Experimental Results on Cifar 10 Dataset.
Researcher Affiliation Academia 1Department of Computer Science, University of Maryland, College Park. 2Department of Computer Science and Engineering, University of Texas Arlington. Correspondence to: Qi He <EMAIL>, Heng Huang <EMAIL>.
Pseudocode Yes Algorithm 1 Shuffling-type Gradient Algorithm
Open Source Code No The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository. It mentions third-party models like Resnet18 but not the authors' own implementation code.
Open Datasets Yes 1https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who?resource=download [...] for image classification task on Cifar 10 dataset (Krizhevsky, 2009)
Dataset Splits No The paper mentions 'We use the first 2000 samples {xi, yi}2000 i=1 with features xi R34 and targets yi R for training.' for the DRO problem, indicating a training set, but does not provide explicit splits (percentages or counts) for validation or test sets for any of the datasets used, nor does it refer to standard splits with citations for CIFAR-10.
Hardware Specification No The paper does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instance types) used for running the experiments. It only generally discusses training models.
Software Dependencies No The paper mentions training Resnet18 and using cross-entropy loss, which implies the use of a deep learning framework, but it does not specify any software libraries or tools with version numbers.
Experiment Setup Yes Specifically, for each SGD update x x η fk,i(x), (k, i) E is obtained uniformly at random. [...] We implement each algorithm 100 times with initialization x0 = [1, . . . , 1] and fine-tuned stepsizes 0.01 (i.e., η = 0.01 for SGD and ηt n = 0.01 for Algorithm 1) [...] all the stepsizes are fine-tuned to be 10 5. [...] We select constant stepsizes 2 10 6 and η(t) j 0.007 m for SGD and Algorithm 1 respectively by fine-tuning [...] We use initialization η0 = 0.1 and w0 R34 from standard Gaussian distribution. [...] stepsizes η(t) j = ηt n = 10 7. [...] with batchsize 200 and stepsize 10 3.