reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dimension-Free Adaptive Subgradient Methods with Frequent Directions

Authors: Sifan Yang, Yuanyu Wan, Peijia Li, Yibo Wang, Xiao Zhang, Zhewei Wei, Lijun Zhang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results have verified the efficiency and effectiveness of our approaches. Finally, we conduct experiments on online classification and neural network training to validate the superiority of our methods.
Researcher Affiliation	Academia	1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2School of Artificial Intelligence, Nanjing University, Nanjing, China 3School of Software Technology, Zhejiang University, Ningbo, China 4Hangzhou High Tech Zone (Binjiang) Institute of Blockchain and Data Security, Hangzhou, China 5Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 6Pazhou Laboratory (Huangpu), Guangzhou, China. Correspondence to: Lijun Zhang <EMAIL>.
Pseudocode	Yes	Algorithm 1 Frequent Directions (FD) Algorithm 2 Follow the Sketchy Leader (FTSL) Algorithm 3 Follow the Fast Sketchy Leader (FTFSL) Algorithm 4 Frequent Directions in General Form Algorithm 5 FTSL-Shampoo Algorithm 6 Online to Batch Conversion
Open Source Code	No	The paper does not provide any explicit statements about releasing source code, nor does it include links to code repositories.
Open Datasets	Yes	First, we perform online classification to evaluate the performance of our methods with two real-world datasets from LIBSVM (Chang & Lin, 2011) repository: Gisette and Epsilon... The experiments involve training Res Net18 and Res Net34 models (He et al., 2016) on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009)... Concretely, we train a 2-layer Transformer (Vaswani et al., 2017) over the Wi Ki-Text2 dataset (Merity, 2016).
Dataset Splits	Yes	For Gisette dataset, we set the batch size n = 32, the sketching size τ = 50 to be 1% of the original dimensionality, and T = 2000... Epsilon dataset consists of 400, 000 training samples and 100, 000 testing samples... The experiments involve training Res Net18 and Res Net34 models (He et al., 2016) on the CIFAR-10 and CIFAR-100 datasets (Krizhevsky, 2009), respectively, for 200 iterations with batch size of 128... The batch size is set as 64 and all methods are trained for 40 epochs with dropout rate 0.1.
Hardware Specification	Yes	All experiments are conducted on 8 NVIDIA 3090 GPUs.
Software Dependencies	No	The paper mentions using LIBSVM but does not provide specific version numbers for any software, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	For Gisette dataset, we set the batch size n = 32, the sketching size τ = 50 to be 1% of the original dimensionality, and T = 2000. For Epsilon dataset, we set the batch size n = 128, τ = 20 and T = 5000... For ADA-FFD, S-ADA, FTFSL, the sketching size τ is determined based on the dimensionality of the flattened gradient, which is defined as: τ = min{ d 0.1 , 100}... For S-Shampoo and FTSL-Shampoo, due to its memory efficiency, we set τ = 0.1 di... We use 256 dimensional word embeddings, 256 hidden unites and 2 heads. We also clip the gradients by norm 0.5 in case of the exploding gradient. The batch size is set as 64 and all methods are trained for 40 epochs with dropout rate 0.1.