Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models

Authors: Taj Jones-Mccormick, Aukosh Jagannath, Subhabrata Sen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5.2. Simulations We conduct a few simulations to empirically demonstrate our claims in finite dimensions. In the first simulation, we consider letting f(x) = x3 − 3x and setting λ = 1, η1 = 0.45. We then conduct SGD from both random initializations and from estimates of v obtained via PCA. We use dimension d = 1000 and let SGD run for 3 2d2 = 1, 500, 000 steps of size 1 10d2 = (10, 000, 000)−1. We select the parameters such that we would expect to be able to recover the true parameter vector from a random initialization had we been in the case λ = 0. We determine this scaling based on the results of Ben Arous et al. (2021) and some experimenting. See Figure 1.
Researcher Affiliation Academia 1Department of Statistics and Actuarial Science, University of Waterloo, Canada 2Cheriton School of Computer Science, University of Waterloo, Canada 3Department of Statistics, Harvard University, United States of America. Correspondence to: Taj Jones Mc Cormick <EMAIL>, Aukosh Jagannath <EMAIL>, Subhabrata Sen <EMAIL>.
Pseudocode No The paper describes the Stochastic Gradient Descent (SGD) updates using mathematical equations, but it does not present these as a clearly labeled algorithm block or in pseudocode format. For example, it defines the update rule as: Xt+1 = Xt − δ d ∇L(Xt, y) / ||∇L(Xt, y)||, but this is embedded within the text describing the method rather than presented as a distinct algorithm.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository in the main text or supplementary sections.
Open Datasets No The paper describes generating synthetic data based on single-index models with Gaussian features and spiked covariance for its theoretical analysis and simulations, stating: 'Let the labeled data be (yi, ai)N i=1, with each (yi, ai) independent and identically distributed.' It does not refer to any specific publicly available dataset by name or provide any access information (links, DOIs, or citations to existing public datasets).
Dataset Splits No The paper uses synthetic data generated based on a model for its analysis and simulations. While it refers to the 'total number of steps (and samples of (yi, ai)) given by N = αdd', it does not specify explicit training, validation, or test dataset splits. The data is generated on-the-fly for the purpose of the theoretical and simulation analysis without detailing a reproducible splitting methodology for evaluation.
Hardware Specification No The paper describes simulations in Section 5.2, but it does not provide any specific details about the hardware used to run these experiments (e.g., GPU models, CPU types, memory specifications). It only mentions the dimension `d = 1000` for the simulations.
Software Dependencies No The paper does not provide any specific ancillary software details, such as programming languages, libraries, or solvers with version numbers, that were used for the implementation or experiments.
Experiment Setup Yes In Section 5.2 'Simulations', the paper explicitly details parameters for the experiments: 'We consider letting f(x) = x3 - 3x and setting λ = 1, η1 = 0.45. We use dimension d = 1000 and let SGD run for 3/2 d2 = 1, 500, 000 steps of size 1/(10d2) = (10, 000, 000)-1.' It also mentions initializing 'from both random initializations and from estimates of v obtained via PCA.'