reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A second-order-like optimizer with adaptive gradient scaling for deep learning

Authors: Jerome Bolte, Ryan Boustany, Edouard Pauwels, Andrei Purica

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	INNAprop is evaluated on CIFAR-10, Food101, and Image Net with Res Nets, VGG, Dense Net, and Vi T. We also train GPT-2 (Open Web Text) from scratch and with Lo RA fine-tuning (E2E). INNAprop consistently offers close performance to Adam W, while performing significantly better in our LLM training experiments, achieving faster convergence and higher accuracy with minimal hyperparameter tuning, even at large scale.
Researcher Affiliation	Collaboration	Jérôme Bolte EMAIL Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France Ryan Boustany EMAIL Toulouse School of Economics, University of Toulouse Capitole, Thales LAS France Edouard Pauwels EMAIL Toulouse School of Economics, University of Toulouse Capitole, Toulouse, France Andrei Purica EMAIL Thales LAS France
Pseudocode	Yes	Algorithm 1 Deep learning implementation of INNAprop Algorithm 2 INNAprop Algorithm 3 INNAprop with (α, β) = (1, 1) Algorithm 4 DINAdam
Open Source Code	Yes	Our code is public 1. 1https://github.com/innaprop/innaprop
Open Datasets	Yes	INNAprop is evaluated on CIFAR-10, Food101, and Image Net with Res Nets, VGG, Dense Net, and Vi T. We also train GPT-2 (Open Web Text) from scratch and with Lo RA fine-tuning (E2E). ...CIFAR10 (Krizhevsky & Hinton, 2010)...Image Net-1k benchmark (Krizhevsky et al., 2012)...Food101 dataset (Bossard et al., 2014)...Open Web Text dataset Gokaslan & Cohen (2019)...E2E dataset Novikova et al. (2017)
Dataset Splits	Yes	We fine-tune the same GPT-2 models on the E2E dataset Novikova et al. (2017), consisting of roughly 42000 training examples, 4600 test examples from the restauration domain.
Hardware Specification	Yes	GPUs 1 V100 (for CIFAR-10 and Food101 experiments); GPUs 4 V100 (for Res Net18 and Res Net50 Image Net experiments); GPUs 8 A100 (for Vi T/B-32 Image Net experiment); GPUs 4 A100 (for GPT-2 from scratch experiment); GPUs 1 A100 (for GPT-2 with Lo RA experiment).
Software Dependencies	No	The paper mentions using "PyTorch tutorial code", "optuna (Akiba et al., 2019)", "nano GPT repository" and "Lo RA codebase" but does not specify version numbers for these software components or libraries.
Experiment Setup	Yes	Hyperparameter tuning: We consider VGG11 (Simonyan & Zisserman, 2014) and Res Net18 (He et al., 2016) models trained on CIFAR10 (Krizhevsky & Hinton, 2010). We fix a cosine scheduler where Tmax = 200, as recommended for Adam W, and γmin = 0 (see Appendix D for more details) and consider two weight decay parameters λ = 0 or λ = 0.01 (defaut value for Adam W). We tune the initial learning rate γ0 only for Adam W. We find γ0 = 10 3, which is also the baseline value reported for Adam W in this experiment (see Appendix E). For INNAprop, we tune only α and β using γ0 = 10 3 from Adam W. Using optuna (Akiba et al., 2019), we perform a grid search over (α, β) {0.1, 0.5, 0.9, . . . , 3.5, 4.0}. Appendix G provides detailed tables of hyper-parameter values for various experiments including architecture, epochs, batch size, learning rates, weight decay, and specific settings for optimizers and training.