Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Unified Convergence Theory of Stochastic and Variance-Reduced Cubic Newton Methods

Authors: El Mahdi Chayti, Martin Jaggi, Nikita Doikov

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report the results in terms of time and gradient arithmetic computations needed to arrive at a given level of convergence. Figure 1 shows that the lazy version saves both time and arithmetic computations without sacrificing the convergence precision. In these graphs, Gradcost is computed using the convention that computing one hessian is d times as expensive as computing one gradient.
Researcher Affiliation Academia El Mahdi Chayti EMAIL Machine Learning and Optimization Laboratory (MLO), EPFL Martin Jaggi EMAIL Machine Learning and Optimization Laboratory (MLO), EPFL Nikita Doikov EMAIL Machine Learning and Optimization Laboratory (MLO), EPFL
Pseudocode Yes Algorithm 1 Cubic Newton with helper functions Require: x0 Rd, S, m 1, M > 0. 1: for t = 0, . . . , Sm 1 do 2: if t mod m = 0 then 3: Update xt (using previous states xi t) 4: else 5: xt = xt 1 6: Form helper functions h1, h2 7: Compute the gradient gt = G(h1, xt, xt), and the Hessian Ht = H(h2, xt, xt) 8: Compute the cubic step xt+1 arg miny RdΩM,gt,Ht(y, xt) return xout using the history (xi)0 i Sm
Open Source Code Yes Our code is available with all the details necessary for reproducing our results in https://github.com/ elmahdichayti/Unified-Convergence-Theory-of-Cubic-Newton-s-method.
Open Datasets Yes To verify our findings from Subsection 3.3, we consider a logistic regression problem (21) with ℓ2-regularization on the a9a data set Chang & Lin (2011). ... We consider, in this section, other datasets from the Lib SVM library (Chang & Lin, 2011). ... Figure 7: Experiments on Mushrooms (n = 8124, d = 112), Covtype (n = 581012, d = 54), w8a (n = 49749, d = 300) datasets.
Dataset Splits No The paper mentions using well-known datasets like a9a, Mushrooms, Covtype, w8a, MNIST, and CIFAR10. While these datasets typically have standard splits, the paper does not explicitly describe the specific training/test/validation splits used for its experiments, nor does it provide percentages, sample counts, or refer to specific citations for the splits within the main text.
Hardware Specification No The paper does not explicitly mention any specific hardware (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments. While it reports results in terms of 'Time, s' in figures, the underlying hardware specifications are not provided.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the implementation of the methods or experiments. It points to a GitHub repository, but the paper itself lacks this detail.
Experiment Setup No The paper describes various experiments, including logistic regression with L2-regularization and diagonal neural networks. It mentions parameters like 'm' for the helper functions and problem dimensions (e.g., 'd=123, n=32561'). However, it does not provide concrete hyperparameters for the optimization algorithms (e.g., specific learning rates for Adam, regularization strength for L2, batch sizes for SGD, or the number of epochs), making it difficult to reproduce the exact experimental setup from the text alone.