OPTAMI: Global Superlinear Convergence of High-order Methods
Authors: Dmitry Kamzolov, Artem Agafonov, Dmitry Pasechnyuk, Alexander Gasnikov, Martin Takáč
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | First, we show that the basic high-order methods, such as the Cubic Regularized Newton Method, exhibit global superlinear convergence for µ-strongly star-convex functions, a class that includes µ-strongly convex functions and some non-convex functions. Theoretical convergence results are both inspired and supported by the practical performance of these methods. Secondly, we propose a practical version of the Nesterov Accelerated Tensor method, called NATA. It significantly outperforms the classical variant and other high-order acceleration techniques in practice. The convergence of NATA is also supported by theoretical results. Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as Py Torch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order. |
| Researcher Affiliation | Academia | 1 Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE 2 Toulouse School of Economics, Toulouse, France 3 Moscow Institute of Physics and Technology, Dolgoprudny, Russia 4 Innopolis University AI Research Center, Kazan, Russia 5 Institute for System Programming RAS, Moscow, Russia |
| Pseudocode | Yes | Algorithm 1 Nesterov Accelerated Tensor Method Algorithm 2 Nesterov Accelerated Tensor Method with At-Adaptation (NATA) Algorithm 3 Nesterov Accelerated Tensor Method Algorithm 4 Nesterov Accelerated Tensor Method with At-Adaptation (NATA) Algorithm 5 Inexact p-th order Near-optimal Accelerated Tensor Method Algorithm 6 Inexact p-th-order Proximal Point Method with Segment Search Algorithm 7 Optimal Tensor Method |
| Open Source Code | Yes | Finally, we introduce an open-source computational library for high-order methods, called OPTAMI. This library includes various methods, acceleration techniques, and subproblem solvers, all implemented as Py Torch optimizers, thereby facilitating the practical application of high-order methods to a wide range of optimization problems. We hope this library will simplify research and practical comparison of methods beyond first-order. 2https://github.com/OPTAMI/OPTAMI |
| Open Datasets | Yes | We evaluate performance on the a9a dataset (d = 123, n = 32561) and w8a (d = 300, n = 49749) from Lib SVM by Chang & Lin (2011). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits, percentages, or methodology for splitting, beyond mentioning the datasets used and some preprocessing steps. |
| Hardware Specification | Yes | All methods and experiments were performed using Python 3.11, Py Torch 2.2.2, on a 13inch Mac Book Pro 2019 with 1,4 GHz Quad-Core Intel Core i5 and 8GB memory. |
| Software Dependencies | Yes | All methods and experiments were performed using Python 3.11, Py Torch 2.2.2, on a 13inch Mac Book Pro 2019 with 1,4 GHz Quad-Core Intel Core i5 and 8GB memory. All computations are done in torch.double. All methods are implemented as Py Torch 2 optimizers. |
| Experiment Setup | Yes | For Figures 6, 5 and 7a, we choose the regularizer µ = 10 4 to get strongly-convex function f. For Figures 2,4, and 8, we choose the regularizer µ = 0 to get a convex function f. For the better conditioning, we normalize data features ai = 1. For the normalized case, we choose theoretical L2 = 0.1. We set L3 = L2 = 0.1 to demonstrate the convergence rates for the same constants L. Note, that actual L3 is smaller than 0.1. We choose the starting point x0 = 3e, where e is a vector of all ones. For Figures 1 and 7b, we set d = 20, µ = 10 3, we ve tuned L3 = L2 = 10.. For Poisson regression: d = 21, n = 6000. We set L1 = L2 = L3 = 1 and x0 = e is all ones. |