Convergence Analysis of Fractional Gradient Descent

Authors: Ashwani Aggarwal

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up. Figure 1 depicts convergence on a quadratic function for standard gradient descent as well as AT-CFGD and the method in Corollary 15 labeled Fractional Descent guided by Gradient. For specifically picked hyperparameters, both of these fractional methods can significantly outperform standard gradient descent.
Researcher Affiliation Academia Ashwani Aggarwal Department of Computer Science University of California, Los Angeles EMAIL
Pseudocode No The paper describes the fractional gradient descent method mathematically (e.g., "xt+1 = xt ηt Cδα,β ct f(xt)"), but it does not contain explicit pseudocode or algorithm blocks with structured formatting or labels.
Open Source Code No No explicit statement or link for open-source code release for the methodology described in this paper was found.
Open Datasets No The paper conducts experiments on synthetic quadratic functions such as "f(x, y) = 10x2 + y2" and "f(x) = x T diag([10, 1, 1, 1, 1])x". These are synthetic functions, not publicly available datasets, and no specific access information is provided for them.
Dataset Splits No The experiments in this paper are conducted on synthetic quadratic functions like f(x, y) = 10x2 + y2, which do not involve standard dataset splits for training, validation, or testing.
Hardware Specification No No specific hardware details (such as CPU/GPU models, processor types, or memory amounts) used for running the experiments were mentioned in the paper.
Software Dependencies No No specific software dependencies or version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) were mentioned for replicating the experiments.
Experiment Setup Yes Figure 1: Convergence of descent methods on function f(x, y) = 10x2 + y2 beginning at x = 1, y = 10. In all cases, the optimal (not theoretical) step size is used. AT-CFGD is as described in Shin et al. (2021) with x( 1) = 1.5, y( 1) = 10.5, α = 1/2, β = 4/10. Fractional Descent guided by Gradient is the method discussed in Corollary 15 with α = 1/2, β = 4/10, λt = 0.0675(t+1)0.2 in xt ct = λt f(xt). Also, Figure 3 and 4 specify "Hyper-parameters as in Corollary 15 are α = 1/2, β = 4/10, λt = 0.0675".