reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stacey: Promoting Stochastic Steepest Descent via Accelerated $\ell_p$-Smooth Nonconvex Optimization

Authors: Xinyu Luo, Site Bai, Bolian Li, Petros Drineas, Ruqi Zhang, Brian Bullins

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present empirical evidence that the STACEY optimizer outperforms other optimizers in both convergence speed and accuracy. We evaluate STACEY s effectiveness on image classification (Section 5.1), and LLM pretraining (Section 5.2). The hyperparameter choices and tuning are summarized in Appendix C. Table 1. Image classification on CIFAR at the 50th, 100th, and 200th epochs. STACEY consistently outperforms other optimizers, demonstrating both superior accuracy and faster convergence. Table 2. Image classification on Image Net at the 20th, 40th, and 60th epochs. STACEY demonstrates superior test accuracy and faster convergence compared to other optimizers. Figure 1. Learning curves of CIFAR classification with varying ℓp-norm.
Researcher Affiliation	Academia	1Department of Computer Science, Purdue University, Indiana, USA. Correspondence to: Xinyu Luo <EMAIL>, Cedar Site Bai <EMAIL>, Bolian Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 STACEY(p,2) Optimizer Algorithm 2 STACEY(p,p) Optimizer Algorithm 3 Stochastic ℓp Descent
Open Source Code	Yes	Code can be found at https:// github.com/xinyuluo8561/Stacey.
Open Datasets	Yes	We train Res Net18 (He et al., 2016) on the CIFAR dataset (Krizhevsky, 2009) for 200 epochs... We train Res Net50 (He et al., 2016) with a batch size 256 on Image Net (Deng et al., 2009)... We pretrain llama-100m (Touvron et al., 2023) on the C4 subset.
Dataset Splits	Yes	We train Res Net18 (He et al., 2016) on the CIFAR dataset (Krizhevsky, 2009) for 200 epochs... We train Res Net50 (He et al., 2016) with a batch size 256 on Image Net (Deng et al., 2009) for 60 epochs.
Hardware Specification	No	Due to computational resource limitations, the batch sizes used in this paper are smaller than those in Lion s original paper (Chen et al., 2024).
Software Dependencies	No	The paper mentions optimizers like SGD, Adam, Adam W, and Lion and refers to
Experiment Setup	Yes	The hyperparameter choices and tuning are summarized in Appendix C. We summarize the hyperparameters used in our experiments in Tables 4, 5, and 6. These hyperparameters are determined through a grid search. Specifically, we perform a search to identify appropriate values for the ℓp-norm, learning rate η, α, and weight decay λ. This process involves an initial rough comparison across a range of magnitudes, followed by a more precise grid search to determine the optimal values.