OPTAMI: Global Superlinear Convergence of High-order Methods

Authors: Dmitry Kamzolov, Artem Agafonov, Dmitry Pasechnyuk, Alexander Gasnikov, Martin Takáč

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for µ-strongly star-convex functions, a class that includes µ-strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called NATA. It significantly outperforms the classical variant and other high-order acceleration techniques in practice. The convergence of NATA is also supported by theoretical results. Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as Py Torch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order.
Researcher Affiliation Academia 1 Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 2 Toulouse School of Economics, Toulouse, France 3 Moscow Institute of Physics and Technology, Dolgoprudny, Russia 4 Innopolis University AI Research Center, Kazan, Russia 5 Institute for System Programming RAS, Moscow, Russia
Pseudocode Yes Algorithm 1 Nesterov Accelerated Tensor Method Algorithm 2 Nesterov Accelerated Tensor Method with At-Adaptation (NATA) Algorithm 3 Nesterov Accelerated Tensor Method Algorithm 4 Nesterov Accelerated Tensor Method with At-Adaptation (NATA) Algorithm 5 Inexact p-th order Near-optimal Accelerated Tensor Method Algorithm 6 Inexact p-th-order Proximal Point Method with Segment Search Algorithm 7 Optimal Tensor Method
Open Source Code Yes Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as Py Torch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order. 2https://github.com/OPTAMI/OPTAMI
Open Datasets Yes We evaluate performance on the a9a dataset (d = 123, n = 32561) and w8a (d = 300, n = 49749) from Lib SVM by Chang & Lin (2011).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits, percentages, or methodology for splitting, beyond mentioning the datasets used and some preprocessing steps.
Hardware Specification Yes All methods and experiments were performed using Python 3.11, Py Torch 2.2.2, on a 13inch Mac Book Pro 2019 with 1,4 GHz Quad-Core Intel Core i5 and 8GB memory.
Software Dependencies Yes All methods and experiments were performed using Python 3.11, Py Torch 2.2.2, on a 13inch Mac Book Pro 2019 with 1,4 GHz Quad-Core Intel Core i5 and 8GB memory. All computations are done in torch.double. All methods are implemented as Py Torch 2 optimizers.
Experiment Setup Yes For Figures 6, 5 and 7a, we choose the regularizer µ = 10 4 to get strongly-convex function f. For Figures 2,4, and 8, we choose the regularizer µ = 0 to get a convex function f. For the better conditioning, we normalize data features ai = 1. For the normalized case, we choose theoretical L2 = 0.1. We set L3 = L2 = 0.1 to demonstrate the convergence rates for the same constants L. Note, that actual L3 is smaller than 0.1. We choose the starting point x0 = 3e, where e is a vector of all ones. For Figures 1 and 7b, we set d = 20, µ = 10 3, we ve tuned L3 = L2 = 10.. For Poisson regression: d = 21, n = 6000. We set L1 = L2 = L3 = 1 and x0 = e is all ones.