Learning Complexity of Gradient Descent and Conjugate Gradient Algorithms
Authors: Xianqi Jiao, Jia Liu, Zhiping Chen
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we introduce a framework that models algorithm selection as a statistical learning problem, and thus learning complexity can be estimated by the pseudo-dimension of the algorithm group. We first propose a new cost measure for unconstrained optimization algorithms, inspired by the concept of primal-dual integral in mixed-integer linear programming. Based on the new cost measure, we derive an improved upper bound for the pseudo-dimension of gradient descent algorithm group by discretizing the set of step size configurations. Moreover, we generalize our findings from gradient descent algorithm to the conjugate gradient algorithm group for the first time, and prove the existence a learning algorithm capable of probabilistically identifying the optimal algorithm with a sufficiently large sample size. |
| Researcher Affiliation | Academia | Xianqi Jiao, Jia Liu , Zhiping Chen School of Mathematics and Statistics, Xi an Jiaotong University, Shaanxi, China EMAIL, EMAIL |
| Pseudocode | No | The paper describes the steps of Gradient Descent (GD) and Conjugate Gradient (CG) algorithms in textual paragraphs, such as 'Recall the basic gradient descent algorithm for minimizing a function f given an initial point z0 over RN: 1. Initialize z := z0, 2. While || f(z)||2 > ν, z := z ρ f(z).', but it does not present them as a structured pseudocode block or algorithm figure. |
| Open Source Code | No | The paper mentions an 'Extended version https://arxiv.org/abs/2412.13473' where 'All proofs and some introductory background' can be found. There is no explicit statement or link indicating the release of source code for the methodology described in this paper. |
| Open Datasets | No | The paper is a theoretical work on learning complexity and does not describe or utilize any datasets for experimental evaluation. Therefore, no information about publicly available or open datasets is provided. |
| Dataset Splits | No | The paper is a theoretical work and does not use any datasets for experiments, thus no information on dataset splits is provided. |
| Hardware Specification | No | The paper is a theoretical work focusing on learning complexity bounds for optimization algorithms and does not describe any experimental setup, therefore no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is a theoretical work and does not describe any experimental setup, thus no specific software dependencies with version numbers are provided. |
| Experiment Setup | No | The paper presents a theoretical framework and derives complexity bounds for algorithms; it does not include an experimental section with specific setup details, hyperparameters, or training configurations. |