On the Almost Sure Convergence of the Stochastic Three Points Algorithm

Authors: Taha EL BAKKALI EL KADI, Omar Saadi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 NUMERICAL EXPERIMENTS In our experiment, we run 50 trajectories for each of the three algorithms: the STP algorithm, the RGF algorithm (Nesterov & Spokoiny, 2017), and the GLD algorithm (Golovin et al., 2020), all starting from the same initial point 0. We simulate log10( f(θT ) 2) as a function of the number of iterations, as well as the elapsed time in seconds. Additionally, to verify the rate assured by Theorem 1 for the STP algorithm, we simulate T 0.49 mint T f(θt) 2 as a function of the number of iterations. Figure 1 and Figure 2 illustrate the logarithmic decay of the gradient norm vs. Iterations and Time.
Researcher Affiliation Academia Taha EL BAKKALI EL KADI, Omar SAADI UM6P College of Computing Benguerir, Morocco EMAIL
Pseudocode Yes Algorithm 1 Stochastic Three Points (STP) 1: Input: θ1 Rd, step size sequence {αt}t 1 (0, )N , probability distribution D on Rd. 2: for t = 1, 2, . . . do 3: Generate a random vector st D, 4: θt+1 = arg minθ {θt,θt+αtst,θt αtst} f(θ). 5: end for
Open Source Code No The paper does not contain any explicit statement about providing source code, nor does it provide a link to a code repository.
Open Datasets Yes Let s consider the following optimization problem: min θ Rd f(θ) = 1 2 (θ1)2 + 1 i=1 (θi+1 θi)2 + 1 2 (θd)2 θ1, initial vector: θ1 = 0, where d = 500.
Dataset Splits No The numerical experiments are conducted on a synthetic objective function, not a dataset requiring explicit splits for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes For the STP algorithm, we set the step sizes to be αt = 4 t0.51 , and the random search directions st are generated uniformly on the unit sphere of Rd. The chosen step sizes adhere to the form provided in the second result of Theorem 1, where ϵ = 0.01. In this implementation, we set µt = 10 4 and ht = 1 L, where L 4 represents the smoothness parameter of the objective function. For this algorithm [GLD], we use the standard Gaussian distribution D and set r = 10 5 and R = 10 4.