The Power of Random Features and the Limits of Distribution-Free Gradient Descent
Authors: Ari Karchmer, Eran Malach
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Our main result shows that if a parametric model can be learned using mini-batch stochastic gradient descent (b SGD) without making assumptions about the data distribution, then with high probability, the target function can also be approximated using a polynomialsized combination of random features. Along the way, we also introduce a new theoretical framework called average probabilistic dimension complexity (adc), which extends the probabilistic dimension complexity developed by Kamath et al. (2020). We prove that adc has a polynomial relationship with statistical query dimension, and use this relationship to demonstrate an infinite separation between adc and standard dimension complexity. |
| Researcher Affiliation | Academia | 1Harvard University, Cambridge, Massachusetts, USA. |
| Pseudocode | Yes | Algorithm 1 Predict(z) 1: input: point z; example oracle access to function f µ, with respect to example distribution ρ. 2: Sample g µ. 3: Sample x, f(x) from example oracle. 4: predict: g(z) g(x) f(x) |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating the release of open-source code for the methodology described. |
| Open Datasets | No | This paper is theoretical and focuses on mathematical proofs and definitions; it does not present empirical results or use specific datasets that would be made publicly available. |
| Dataset Splits | No | This is a theoretical paper that does not involve empirical experiments requiring dataset splits. |
| Hardware Specification | No | This is a theoretical paper and does not describe any experimental hardware used. |
| Software Dependencies | No | This is a theoretical paper and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | This is a theoretical paper and does not include details on experimental setup, hyperparameters, or training configurations. |