On Lower and Upper Bounds in Smooth and Strongly Convex Optimization

Authors: Yossi Arjevani, Shai Shalev-Shwartz, Ohad Shamir

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We develop a novel framework to study smooth and strongly convex optimization algorithms. Focusing on quadratic functions we are able to examine optimization algorithms as a recursive application of linear operators. This, in turn, reveals a powerful connection between a class of optimization algorithms and the analytic theory of polynomials whereby new lower and upper bounds are derived. Lastly, expressing it as an optimal solution for the corresponding optimization problem over polynomials, as formulated by our framework, we present a novel systematic derivation of Nesterov s well-known Accelerated Gradient Descent method. Let us demonstrate the gain in the performance allowed by A3 in a very simple setting. Define A to be Diag (µ, L) rotated counter-clockwise by 45... Figure 4 shows the error of A3, AGD and HB vs. iteration number.
Researcher Affiliation Academia Yossi Arjevani EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot 7610001, Israel Shai Shalev-Shwartz EMAIL School of Computer Science and Engineering The Hebrew University Givat Ram, Jerusalem 9190401, Israel Ohad Shamir EMAIL Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot 7610001, Israel
Pseudocode No The paper describes various algorithms (Full Gradient Descent, Newton method, The Heavy Ball Method, Accelerated Gradient Descent, Stochastic Coordinate Descent, Conjugate Gradient Descent, Stochastic Gradient Descent) through mathematical equations and textual descriptions, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about releasing source code, links to repositories, or mentions of code in supplementary materials.
Open Datasets No The paper uses mathematically constructed quadratic functions and specific settings for analysis and demonstration, such as 'Let us define A to be Diag (µ, L) rotated counter-clockwise by 45', rather than relying on or providing access to publicly available datasets for experimental evaluation.
Dataset Splits No The paper focuses on theoretical analysis and demonstrations on mathematically constructed quadratic functions, and therefore does not discuss dataset splits like training, validation, or test sets.
Hardware Specification No The paper analyzes optimization algorithms and their convergence properties theoretically, and includes simulations on mathematically defined problems, without specifying any particular hardware used for computations.
Software Dependencies No The paper is a theoretical work and does not specify any software dependencies with version numbers for reproducing its findings.
Experiment Setup No The paper focuses on theoretical analysis of optimization algorithms. The numerical demonstrations involve setting up mathematical parameters for algorithms and functions (e.g., 'initialized at x0 = 0'), but these are not experimental setup details like hyperparameters or training configurations for an empirical study.