reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stability and Generalization for Stochastic (Compositional) Optimizations

Authors: Xiaokang Pan, Jin Liu, Hulin Kuang, Youqi Li, Lixing Chen, Zhe Qu

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conducted a toy experiment to compare SGD and STORM in convex and non-convex settings, as shown in Figure 1. In the non-convex setting, STORM demonstrated a faster convergence rate than SGD, as shown in Figure 1b, and it also exhibited superior generalization performance because the gap between training loss and testing loss is smaller. However, in the convex setting (Figure 1a), the trend reversed. It can be noticed that the estimator not only changes the convergence rate of the algorithm, but also the generalization of the algorithm. This observation raises an interesting question: How does the estimator affect the generalization? Therefore, in this paper, we analyze the effect of estimators on generalization under the SO and SCO problems. We first conduct a generalization analysis of STORM, comparing the results to SGD and exploring the estimator s impact on generalization as an initial step. Considering the complexity of the SCO problem, we then propose a generalized algorithmic framework that encompasses many existing algorithms and applies to multiple estimators and various estimation strategies i.e., the objects to which the estimator is applied. Subsequently, we analyze the generalization of this framework in both convex and non-convex settings. Through our analysis, we identify the estimators that are crucial and those that are less important. Furthermore, we perform a generalization analysis of three representative algorithms SCGD, SCSC, and COVER under the SCO problem to compare how different estimation strategies impact algorithm generalization. Based on this analysis, we also establish a relationship between the convergence rate and generalization, summarized in Table 2.
Researcher Affiliation	Academia	Xiaokang Pan1,2 , Jin Liu1,2 , Hulin Kuang1,2, , Youqi Li3 , Lixing Chen4,5 , Zhe Qu1,2, 1School of Computer Science and Engineering, Central South University 2Xiangjiang Laboratory 3School of Computer Science and Technology, Beijing Institute of Technology 4School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University 5Shanghai Key Laboratory of Integrated Administration Technologies for Information Security EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Framework of SCO 1: Inputs: Training data S = {ν1, . . . , νn, ω1, . . . , ωm}; Number of iterations T, parameter sequence {ηt}, {βt} 2: Initialize x0 X and y0 Rd 3: for t = 0 to T 1 do 4: Randomly sample jt [1, m], obtain gωjt (xt) and gωjt (xt) Rd d2 5: Estimate inner function value ut according to Eq.(5) 6: Estimate inner function gradient vt according to Eq. (6) or Eq. (7) 7: Randomly sample it [1, n], obtain fνit (ut) Rd 8: Calculate the total gradient vt = vt fνit (ut) 9: Update: 10: xt+1 = ΠX xt ηtvt 11: end for 12: Outputs: A(S) = x T or xτ Unif({xt}T t=1)
Open Source Code	No	The paper does not contain any explicit statements about releasing source code or links to a code repository.
Open Datasets	No	For SO problems, we use linear regression in both convex and non-convex settings. For SCO problems, we generate two datasets, S1 and S2, with \|S1\| = \|S2\|. The inner function g( ) fits S1, and f(g( )) fits S2. We aim to minimize the overall loss of the fits on both datasets.
Dataset Splits	No	For SCO problems, we generate two datasets, S1 and S2, with \|S1\| = \|S2\|. The inner function g( ) fits S1, and f(g( )) fits S2. We aim to minimize the overall loss of the fits on both datasets. The paper mentions generating datasets but does not specify train/test/validation splits for these datasets.
Hardware Specification	No	We conduct simulations to validate our theoretical results by examining the relationship between convergence and generalization. To simulate high-dimensional issues, we set the dimension of X to 100. The paper mentions conducting simulations but does not specify the hardware used for these simulations.
Software Dependencies	No	The paper does not specify any software dependencies with version numbers.
Experiment Setup	Yes	To simulate high-dimensional issues, we set the dimension of X to 100. We generate data using the objective function and add Gaussian noise with mean 0 and variance 1 to each dimension to mimic stochastic optimization. We use MSE loss for the convex setting and combine the tanh activation function with MSE loss for the non-convex setting, resulting in a total loss of MSE loss + α tanh(ypredict), where α = 0.1 in both SO and SCO problems. For SO problems, we use linear regression in both convex and non-convex settings. For SCO problems, we generate two datasets, S1 and S2, with \|S1\| = \|S2\|. The inner function g( ) fits S1, and f(g( )) fits S2. We aim to minimize the overall loss of the fits on both datasets. For iteration counts of different methods, we use the results from Table 2 and round to the nearest integer.