reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Random Feature Amplification: Feature Learning and Generalization in Neural Networks

Authors: Spencer Frei, Niladri S. Chatterji, Peter L. Bartlett

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we provide a characterization of the feature-learning process in two-layer Re LU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classiﬁers are no better than random guessing for the distribution we consider, two-layer Re LU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics amplify these weak, random features to strong, useful features. ... We plot the decision boundary resulting from training a two-layer Re LU network given n = 5000 samples... The network was trained for T = 3000 iterations... In Figure 2, we examine the behavior of two-layer Re LU networks trained by gradient descent on the logistic loss for the 2-XOR distribution we consider when 15% of the labels are ﬂipped...
Researcher Affiliation	Collaboration	Spencer Frei EMAIL Simons Institute for the Theory of Computing, University of California, Berkeley, Calvin Lab #230, Berkeley, CA 94720 Niladri S. Chatterji EMAIL Computer Science Department, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305 Peter L. Bartlett EMAIL University of California, Berkeley & Google Deep Mind, 367 Evans Hall #3860 Berkeley, CA 994720
Pseudocode	No	The paper describes algorithms and methods in prose, but does not include any distinct, labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about providing source code for the described methodology, nor does it include a link to a code repository.
Open Datasets	No	The paper describes using a synthetic dataset generated by an XOR-like function of input features from a uniform mixture of four clusters. It does not provide access information (link, DOI, citation) to a publicly available dataset, nor does it make its generated data publicly available.
Dataset Splits	No	The paper states, "Validation accuracy is measured using n = 6000 samples." and for Figure 1, "given n = 5000 samples". However, it does not explicitly provide details on how these samples are split into training, validation, or test sets (e.g., percentages, methodology, or specific files for custom splits). The training data is described as "generated as i.i.d. samples from P" and test error is defined over the distribution P, which does not constitute a fixed dataset split.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or other machine specifications used for running the experiments.
Software Dependencies	No	The paper does not list any specific software components with version numbers (e.g., programming languages, libraries, frameworks, or specialized solvers) that were used in the experimental setup.
Experiment Setup	Yes	In Figure 1, the paper states: "T = 3000 iterations, with network width m = 500, step-size α = 0.05, and initialization variance ω2 init = 1/(32m)". Appendix E, for Figure 2, states: "m = 400 neurons. The within-cluster distribution is Gaussian, Pclust N(0, σ2Id), where the within-cluster variance is given by σ2 = 1/d1.2 and we ﬂip 15% of the labels within each cluster the orthogonal cluster s label. We initialize using centered Gaussians with variance ω2 init = 0.01/md and run with a step-size of α = 0.1."