No Free Lunch from Random Feature Ensembles: Scaling Laws and Near-Optimality Conditions
Authors: Benjamin Samuel Ruben, William Lingxiao Tong, Hamza Tahir Chaudhry, Cengiz Pehlevan
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate monotonicity with P and N in Fig. 1, where we plot EK g as a function of both sample size P and the network size N in ensembles of Re LU random feature models applied to a binarized CIFAR-10 image classification task (see Appendix E.2). Numerically, we verify that error monotonicity with P and N holds at the level of a 0-1 loss on the predicted classes of held-out test examples for both scoreaveraging and majority-vote ensembling over the predictors (see fig. S3). |
| Researcher Affiliation | Academia | 1Biophysics Ph D Program, Harvard University, Cambridge, MA 02138, USA 2John A. Paulson School of Engineering and Applied Science, Harvard University, Cambridge, MA 02138, USA 3Center for Brain Science, Harvard University, Cambridge, MA 02138, USA 4Kempner Institute, Harvard University, Cambridge, MA 02138, USA. |
| Pseudocode | No | The paper describes methods and derivations using mathematical equations and prose (e.g., in Section 2, 'Preliminaries' and Appendix A), but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code used to generate the figures presented in this work is publicly available at https://github.com/benruben87/Random Feature Ensembles.git. |
| Open Datasets | Yes | We demonstrate monotonicity with P and N in Fig. 1, where we plot EK g as a function of both sample size P and the network size N in ensembles of Re LU random feature models applied to a binarized CIFAR-10 image classification task (see Appendix E.2). |
| Dataset Splits | No | The paper mentions using "training sets" of MNIST and CIFAR-10, and also refers to "held-out test examples". However, it does not provide specific details on the exact percentages, sample counts, or methodology used for creating training, validation, and test splits needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific details such as GPU models, CPU types, or other hardware specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'neural tangents library (Novak et al., 2019)' and the 'scipy library (Virtanen et al., 2020)'. However, it does not provide specific version numbers for these software dependencies, which are required for a reproducible description. |
| Experiment Setup | Yes | We fix N = 256 and vary both P and K. Color corresponds to the regularization λ. Markers show numerical experiments and dotted lines theoretical predictions. Error is monotonically decreasing with P provided that the regularization λ is tuned to its optimal value. |