Revisiting Convergence: Shuffling Complexity Beyond Lipschitz Smoothness
Authors: Qi He, Peiran Yu, Ziyi Chen, Heng Huang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted numerical experiments to demonstrate that the shuffling-type gradient algorithm converges faster than SGD on two important non-Lipschitz smooth applications. [...] 5. Numerical Experiments: We compare reshuffling gradient algorithm (Algorithm 1) with SGD on multiple ℓ-smooth optimization problems to prove its effectiveness. Experiments are conducted with different shuffling schemes, on convex, strongly convex and nonconvex objective functions, including synthetic functions, phase retrieval, distributionally robust optimization (DRO) and image classification. [...] Figure 1: Experimental Results on Convex (up) and Strongly-convex (down) Objective Functions. [...] Figure 3: Experimental Results on Cifar 10 Dataset. |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Maryland, College Park. 2Department of Computer Science and Engineering, University of Texas Arlington. Correspondence to: Qi He <EMAIL>, Heng Huang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 Shuffling-type Gradient Algorithm |
| Open Source Code | No | The paper does not provide an explicit statement about releasing code, nor does it include a link to a code repository. It mentions third-party models like Resnet18 but not the authors' own implementation code. |
| Open Datasets | Yes | 1https://www.kaggle.com/datasets/kumarajarshi/life-expectancy-who?resource=download [...] for image classification task on Cifar 10 dataset (Krizhevsky, 2009) |
| Dataset Splits | No | The paper mentions 'We use the first 2000 samples {xi, yi}2000 i=1 with features xi R34 and targets yi R for training.' for the DRO problem, indicating a training set, but does not provide explicit splits (percentages or counts) for validation or test sets for any of the datasets used, nor does it refer to standard splits with citations for CIFAR-10. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models, or cloud computing instance types) used for running the experiments. It only generally discusses training models. |
| Software Dependencies | No | The paper mentions training Resnet18 and using cross-entropy loss, which implies the use of a deep learning framework, but it does not specify any software libraries or tools with version numbers. |
| Experiment Setup | Yes | Specifically, for each SGD update x x η fk,i(x), (k, i) E is obtained uniformly at random. [...] We implement each algorithm 100 times with initialization x0 = [1, . . . , 1] and fine-tuned stepsizes 0.01 (i.e., η = 0.01 for SGD and ηt n = 0.01 for Algorithm 1) [...] all the stepsizes are fine-tuned to be 10 5. [...] We select constant stepsizes 2 10 6 and η(t) j 0.007 m for SGD and Algorithm 1 respectively by fine-tuning [...] We use initialization η0 = 0.1 and w0 R34 from standard Gaussian distribution. [...] stepsizes η(t) j = ηt n = 10 7. [...] with batchsize 200 and stepsize 10 3. |