reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AdaGrad under Anisotropic Smoothness

Authors: Yuxing Liu, Rui Pan, Tong Zhang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in logistic regression and instruction following fine-tuning tasks provide strong evidence to support our novel assumption and theoretical analysis. ... 6 EXPERIMENTAL RESULTS ... We utilize real-world datasets a4a, a6a, a9a, real-sim and rcv1.binary from libsvm (Chang & Lin, 2011) ... For nonconvex cases, we check the instructionfollowing fine-tuning task on Alpaca (Taori et al., 2023) dataset with GPT-2 (Radford et al., 2019) model.
Researcher Affiliation	Academia	Yuxing Liu Rui Pan Tong Zhang University of Illinois Urbana-Champaign EMAIL
Pseudocode	Yes	Algorithm 1 Adagrad
Open Source Code	No	The paper discusses the use of a third-party library: 'In all our implementations, we use the version transformers==4.38.2.' and mentions its license. However, there is no explicit statement or link indicating that the authors' own implementation code for the methodology described in this paper is open-sourced.
Open Datasets	Yes	We utilize real-world datasets a4a, a6a, a9a, real-sim and rcv1.binary from libsvm (Chang & Lin, 2011) ... For nonconvex cases, we check the instructionfollowing fine-tuning task on Alpaca (Taori et al., 2023) dataset with GPT-2 (Radford et al., 2019) model. ... Regarding licenses, the Alpaca dataset is released under Creative Commons Attribution Non Commercial 4.0 International Public License (https://github.com/tatsu-lab/ stanford_alpaca/blob/main/DATA_LICENSE)
Dataset Splits	No	The paper mentions using specific datasets (libsvm datasets, Alpaca) and refers to 'fine-tuning tasks', but it does not explicitly provide details about the specific training, validation, or test splits used for its experiments. For example, it does not specify percentages or sample counts for how these datasets were divided for their experiments.
Hardware Specification	Yes	All experiments are conducted on a single A40 GPU, where gradient accumulation is adopted for batch sizes larger than 128 to reduce memory cost.
Software Dependencies	Yes	In all our implementations, we use the version transformers==4.38.2.
Experiment Setup	Yes	grid searches are conducted for both algorithms, with the search space being initial learning rate η {10.0, 1.0, 0.1, 0.01} and learning rate schedules being either constant ηt η or inverse square root decay ηt = η/ t + 1 ... For all experiments, we run 3 epochs of optimization with SGD and Adagrad... We search the learning rate η {1.0, 10 1, 10 2, 10 3, 10 4, 10 5, 10 6} ... The maximum sequence length is set to 512, along with the learning rate schedule being set to cosine decay (Loshchilov & Hutter, 2016).