reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods

Authors: Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical results on the behavior of the generalization bound derived in Theorem 2.1 for real-world data (Fashion MNIST and MNIST datasets) and compare it with the empirical generalization gap. Experiments on learning under NTK with small step-size. In this section, we present numerical results on the behavior of the generalization bound derived in Theorem 2.1 for real-world data (Fashion MNIST and MNIST datasets) and compare it with the empirical generalization gap. Experiments on learning the XOR distribution with large step-size. Figure 4 demonstrates the test error curves associated with learning the XOR distribution according to the setting of Theorem 2.4.
Researcher Affiliation	Academia	Hossein Taheri Department of Computer Science and Engineering, University of California, San Diego. EMAIL Christos Thrampoulidis, Department of Electrical and Computer Engineering, University of British Columbia. EMAIL Arya Mazumdar Department of Computer Science and Engineering, University of California, San Diego. EMAIL
Pseudocode	No	The paper describes algorithms in text, e.g., "wt+1 = wt r b F(wt)" for gradient descent and "wt+1 = wt b F(wt)" for mini-batch SGD, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statements about releasing code or links to a code repository.
Open Datasets	Yes	In this section, we present numerical results on the behavior of the generalization bound derived in Theorem 2.1 for real-world data (Fashion MNIST and MNIST datasets) and compare it with the empirical generalization gap. Experiments on learning the XOR distribution with large step-size. Figure 4 demonstrates the test error curves associated with learning the XOR distribution according to the setting of Theorem 2.4.
Dataset Splits	No	Figure 1: Iteration-based distance from initialization (kwt w0k), training loss, test loss and generalization gap (i.e., test loss train loss) for training a two hidden-layer neural network with Fashion MNIST dataset and two choices of step-size. Here n = 12 103, m = 500, and total number of parameters p 6 105. Figure 2: Iteration-based distance from initialization, training loss, test loss and generalization gap for training a two hidden-layer neural network with Fashion MNIST dataset and m = 250, 500. Here n = 4 103, p 2 105(blue line), 6 105 (red line), and = 0.02. Figure 3: Iteration-based distance from initialization, training loss, test loss and generalization gap for training a two hidden-layer neural network with MNIST dataset and m = 300, 600. Here n = 2 103, p 3 105(blue line), 8 105 (red line) and = 0.02. Figure 4: Left: Misclassiﬁcation error based on iteration in learning the d dimensional XOR distribution with SGD. Right: Total number of SGD steps based on data dimension to reach approximately zero test error. In particular, we ﬁx n = 6d, = m = 20 and set the total number of SGD steps as T = dlog(d)e. Note that the number of iterations required to reach perfect accuracy grows with d. The right side of Figure 4 provides further insight into the relationship between dimensionality and convergence rate. It displays the total number of SGD steps required to reach a test error below 0.01 for different values of d using n = 3d, m = = 20.
Hardware Specification	No	The paper does not provide specific hardware details used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	We consider binary classiﬁcation with a 2-hidden layer network with softplus activation (σ(t) = log(1 + et)) trained by the logistic loss function. Figure 1 presents train, test and generalization behavior of GD for learning a such a model with Fashion MNIST dataset. The two lines in each ﬁgure correspond to = 0.01, 0.1. Experiments on learning the XOR distribution with large step-size. Figure 4 demonstrates the test error curves associated with learning the XOR distribution according to the setting of Theorem 2.4. In particular, we ﬁx n = 6d, = m = 20 and set the total number of SGD steps as T = dlog(d)e.