Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Stochastic Zeroth-Order Optimization under Nonstationarity and Nonconvexity
Authors: Abhishek Roy, Krishnakumar Balasubramanian, Saeed Ghadimi, Prasant Mohapatra
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we illustrate our results on nonstationary nonconvex optimization through simulation, and compare the performance of zeroth-order methods with their higher-order counterparts. [...] In Figure 1 we show the evolution of R(2) G (T) over a horizon of T = 2000 with WT = T using Algorithm 1. Figure 1a and 1b present the results for the two-point and one-point settings respectively. All the algorithm parameters are chosen as described in Theorem 6 and Theorem 18 in the corresponding cases. In both cases the performance of Algorithm 1 is comparable to the first-order variant, i.e., when the stochastic gradient is available. Theoretical rates obtained in Theorem 6 and Theorem 18 are also shown (red, dashed line) for comparison purpose. In Figure 2 we show RENC(T) over a horizon of T = 1000 with WT = T using Algorithm 2 and 3. Figure 2a and 2b presents the result for the two-point and one-point setting respectively. For the two-point setting, all the algorithm parameters are chosen as described in Theorem 12 and Theorem 15 in the higher-order(HO) and zeroth-order(ZO) cases respectively. For the one-point setting the algorithm parameters are chosen as stated in Theorem 22. One can see that the performances of Algorithm 3 is comparable to Algorithm 2 where the higher-order information is available. Theoretical rate for RENC(T) obtained in Theorem 12, Theorem 15 and Theorem 22 is also shown (red, dashed line) for comparison purpose. |
| Researcher Affiliation | Academia | Abhishek Roy EMAIL Department of Statistics University of California Davis, CA 95616, USA Krishnakumar Balasubramanian EMAIL Department of Statistics University of California Davis, CA 95616, USA Saeed Ghadimi EMAIL Department of Management Sciences University of Waterloo Waterloo, ON N2L 3G1, Canada Prasant Mohapatra EMAIL Department of Computer Science University of California Davis, CA 95616, USA |
| Pseudocode | Yes | Algorithm 1 Gaussian Zeroth-order Gradient Descent (GZGD) [...] Algorithm 2 Online Cubic-Regularized Newton Algorithm (OCRN) [...] Algorithm 3 Zeroth-order Cubic Regularized Newton Algorithm (BCRN) |
| Open Source Code | No | The paper does not explicitly provide concrete access to source code for the methodology described. |
| Open Datasets | No | For i = 1, 2, 3, , consider the functions 2T sin(ωx) 2(i 1)SI + 1 t (2i 1)SI SIWT 2T sin(ωx) (2i 1)SI + 1 t 2i SI These functions belong to the uncertainty set DT since, t=1 ft ft+1 2 = 2SIWT For the two-point setting, we assume that the multiplicative noise ξ in the function evaluation is sampled from a uniform distribution U[1 σ, 1 + σ]. It is easy to see that Ft(x, ξ) = ξft(x) satisfies Assumptions 2.1 2.4 t = 1, 2, , T. For the one-point setting, we sample the additive noise ξ from N(0, σ2). Then Assumption 3.1, in addition to Assumptions 2.1 2.4, holds true for Ft(x, ξ) = ft(x) + ξ. For this experiment we set SI = 12, ω = 22, x0 = 0.078, and σ = 0.5. The experiments use synthetically generated functions for simulation, not a publicly available dataset. |
| Dataset Splits | No | The paper describes experiments with synthetic functions and simulations over a time horizon T, but does not involve explicit dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper describes theoretical results and simulations but does not specify any particular hardware used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or their version numbers. |
| Experiment Setup | Yes | For this experiment we set SI = 12, ω = 22, x0 = 0.078, and σ = 0.5. Since in this paper we are interested in the expected regret, Figure 1 and Figure 2 show curves averaged over 50 simulations. In Figure 1 we show the evolution of R(2) G (T) over a horizon of T = 2000 with WT = T using Algorithm 1. Figure 1a and 1b present the results for the two-point and one-point settings respectively. All the algorithm parameters are chosen as described in Theorem 6 and Theorem 18 in the corresponding cases. In both cases the performance of Algorithm 1 is comparable to the first-order variant, i.e., when the stochastic gradient is available. Theoretical rates obtained in Theorem 6 and Theorem 18 are also shown (red, dashed line) for comparison purpose. In Figure 2 we show RENC(T) over a horizon of T = 1000 with WT = T using Algorithm 2 and 3. Figure 2a and 2b presents the result for the two-point and one-point setting respectively. For the two-point setting, all the algorithm parameters are chosen as described in Theorem 12 and Theorem 15 in the higher-order(HO) and zeroth-order(ZO) cases respectively. For the one-point setting the algorithm parameters are chosen as stated in Theorem 22. |