Convergence Analysis of Fractional Gradient Descent
Authors: Ashwani Aggarwal
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, empirical results will be presented on the potential speed up of fractional gradient descent over standard gradient descent as well as some preliminary theoretical results explaining this speed up. Figure 1 depicts convergence on a quadratic function for standard gradient descent as well as AT-CFGD and the method in Corollary 15 labeled Fractional Descent guided by Gradient. For specifically picked hyperparameters, both of these fractional methods can significantly outperform standard gradient descent. |
| Researcher Affiliation | Academia | Ashwani Aggarwal Department of Computer Science University of California, Los Angeles EMAIL |
| Pseudocode | No | The paper describes the fractional gradient descent method mathematically (e.g., "xt+1 = xt ηt Cδα,β ct f(xt)"), but it does not contain explicit pseudocode or algorithm blocks with structured formatting or labels. |
| Open Source Code | No | No explicit statement or link for open-source code release for the methodology described in this paper was found. |
| Open Datasets | No | The paper conducts experiments on synthetic quadratic functions such as "f(x, y) = 10x2 + y2" and "f(x) = x T diag([10, 1, 1, 1, 1])x". These are synthetic functions, not publicly available datasets, and no specific access information is provided for them. |
| Dataset Splits | No | The experiments in this paper are conducted on synthetic quadratic functions like f(x, y) = 10x2 + y2, which do not involve standard dataset splits for training, validation, or testing. |
| Hardware Specification | No | No specific hardware details (such as CPU/GPU models, processor types, or memory amounts) used for running the experiments were mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies or version numbers (e.g., library names with versions like Python 3.8, PyTorch 1.9) were mentioned for replicating the experiments. |
| Experiment Setup | Yes | Figure 1: Convergence of descent methods on function f(x, y) = 10x2 + y2 beginning at x = 1, y = 10. In all cases, the optimal (not theoretical) step size is used. AT-CFGD is as described in Shin et al. (2021) with x( 1) = 1.5, y( 1) = 10.5, α = 1/2, β = 4/10. Fractional Descent guided by Gradient is the method discussed in Corollary 15 with α = 1/2, β = 4/10, λt = 0.0675(t+1)0.2 in xt ct = λt f(xt). Also, Figure 3 and 4 specify "Hyper-parameters as in Corollary 15 are α = 1/2, β = 4/10, λt = 0.0675". |