Are Convex Optimization Curves Convex?
Authors: Guy Barzilai, Ohad Shamir, Moslem Zamani
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this paper, we study when we might expect the optimization curve induced by gradient descent to be convex precluding, for example, an initial plateau followed by a sharp decrease, making it difficult to decide when optimization should stop. We show, perhaps surprisingly, that the answer crucially depends on the choice of the step size. In particular, for the range of step sizes which are known to result in monotonic convergence to an optimal value, we characterize a regime where the optimization curve will be provably convex, and a regime where the curve can be non-convex. We also extend our results to gradient flow, and to the closely-related but different question of whether the gradient norm decreases monotonically. |
| Researcher Affiliation | Academia | Guy Barzilai EMAIL Tel-Aviv University Ohad Shamir EMAIL Weizmann Institute of Science Moslem Zamani EMAIL UCLouvain |
| Pseudocode | No | The paper contains mathematical derivations, proofs, and definitions, but no structured pseudocode or algorithm blocks are present. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code, nor does it provide links to any code repositories or mention code in supplementary materials. |
| Open Datasets | No | The paper is theoretical and does not use any datasets for experiments. The examples provided (e.g., f(x) = x^2, f(x) = |x| + max{0, x}) are for illustrative and theoretical proof purposes, not based on empirical data. |
| Dataset Splits | No | Since the paper is theoretical and does not use any datasets for empirical evaluation, there is no mention of dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not involve empirical experiments requiring specific hardware. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical, focusing on mathematical proofs and analysis. It does not mention any specific software or programming libraries with version numbers used for implementation or experimentation. |
| Experiment Setup | No | The paper focuses on theoretical analysis and proofs regarding optimization curves. It does not describe any empirical experiments or their setup, and thus no hyperparameters or system-level training settings are provided. |