reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Structured Preconditioners in Adaptive Optimization: A Unified Analysis

Authors: Shuo Xie, Tianhao Wang, Sashank J. Reddi, Sanjiv Kumar, Zhiyuan Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our unified analysis challenges this prevailing view and reveals, perhaps surprisingly, that more structured preconditioners, despite using less space and computation per step, can outperform their less structured counterparts. To demonstrate this, we show that one-sided Shampoo, which is relatively much cheaper than fullmatrix Ada Grad could outperform it both theoretically and experimentally. ... In this section we empirically demonstrate the superior performance of 1-sided shampoo over other variants of Ada Reg (Algorithm 1) on a simple but natural setting. Moreover, such superior performance is predicted by our theoretical analysis in Section 4.3, which in turn validates the practical utility of our theory in guiding optimizer selection.
Researcher Affiliation	Collaboration	1Toyota Technological Institute at Chicago 2Google Research. Correspondence to: Shuo Xie <EMAIL>, Zhiyuan Li <EMAIL>.
Pseudocode	Yes	Algorithm 1 Adaptive Regularization Meta-Algorithm Ada Reg (Gupta et al., 2017) ... Algorithm 2 One-sided Shampoo ... Algorithm 3 Two-sided Shampoo (Gupta et al., 2018) ... Algorithm 4 EMA Adaptive Regularization Meta Algorithm ... Algorithm 5 Weighted Adaptive Regularization Meta Algorithm
Open Source Code	No	The paper does not provide any specific link to a code repository, an explicit statement about code release, or mention of code in supplementary materials for the methodology described.
Open Datasets	No	We consider a linear regression problem AX y 2 2 where A is the data matrix and y = AX is the label vector generated by ground-truth X . Thus, the loss function can be equivalently written as f(X) = H, (X X )(X X ) . ... Each element of the solution X is independently sampled from N(0, 1 d).
Dataset Splits	No	The paper describes a synthetic linear regression problem where data and ground truth are generated. It does not mention any training, validation, or test splits for an external dataset, as the problem setup is self-contained.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper discusses various adaptive optimizers like Ada Grad-Norm, Ada Grad, Shampoo, Adam, but it does not specify the version numbers of any software libraries, frameworks, or programming languages used for their implementation or experiments.
Experiment Setup	Yes	We consider X Rd d with d = 103. We set the eigenvalues of H by σ1 = = σ10 = 1 and σi = 1 (i 10)2 for 11 i 103. Each element of the solution X is independently sampled from N(0, 1 d). We run Ada Grad-Norm, Ada Grad, one-sided Shampoo and fullmatrix Ada Grad for 100 steps from initialization X0 = 0. ... The learning rate is tuned over five seeds for last iterate loss and average iterate loss respectfully, selecting the one with the lowest average loss. We use precision float32 and set ϵ = 0 for all the experiments.