reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks

Authors: Fanghui Liu, Leello Dadi, Volkan Cevher

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we conduct numerical experiments to validate our theoretical results in the perspective of the convergence rate of the excess risk. To validate whether the derived (sharper) convergence rate is attainable or not, we construct a simple synthetic dataset under a known fρ in the over-parameterized regime. To be speciﬁc, we assume that the data are sampled from a normal Gaussian distribution, i.e., x N(0, Id) and normalized with x 2 = 1. The feature dimension is d = 3, a low dimension setting to ensure P in Theorem 14 is not large as mentioned before. We set the number of training points to range from 10 to 1000 while the number of test points is held ﬁxed at 20. Albeit simple, such an experimental setting still works in the over-parameterized regime, see Table 1 (Left). We consider the noiseless case, where the target function is generated by a single Re LU, i.e., y = fρ(x) = σ( w , x ) with w N(0, Id). The regularization parameter is set to λ = 10 8 for both two methods, kernel ridge regression via the NTK and the path norm based algorithm. We solve the convex program in Eq. (17) using CVX (Grant and Boyd, 2014) to obtain the exact global minima and then compute the test MSE for regression over 5 runs. The (middle) ﬁgure of Table 1 shows that, when learning a single Re LU beyond RKHS, our algorithm still achieves the same convergence rate as the NTK in RKHS regime. This is because, the input dimension d = 3 is not large, so there is no signiﬁcant difference on the convergence rate. Besides, we also conduct this experiment on a real-world dataset, i.e., the UCI ML Breast Cancer dataset with 569 samples and the dimension d = 30. We set 80% of samples used for training and 20% of samples for test. Here the number of training data ranges from 40 to 300, and the number of test data ranges from 10 to 75, accordingly. The remaining experimental setting is the same as that of the synthetic dataset. The (right) ﬁgure of Table 1 shows that, when increasing the number of training data, the test MSE of NTK slightly decreases. Instead, the path norm based algorithm achieves a signiﬁcant lower test MSE, which demonstrates the attainability of our theoretical results. Nevertheless, we also need to point out that, the path norm based algorithm is quite inefﬁcient and unstable when compared to NTK. The performance is based on an extreme accurate solution by CVX, which restricts the utility of this convex program algorithm in practice. Additionally, we remark here that we do not claim this algorithm is better than SGD.
Researcher Affiliation	Academia	Fanghui Liu EMAIL Department of Computer Science, University of Warwick, Coventry, UK Leello Dadi EMAIL Lab for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland Volkan Cevher EMAIL Lab for Information and Inference Systems, Ecole Polytechnique F ed erale de Lausanne (EPFL), Switzerland
Pseudocode	No	The paper describes a 'computational algorithm' in Section 4.4 and Appendix B, with mathematical formulations of optimization problems (e.g., Eq. 10 and 17). However, it does not provide a clearly labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions typical of pseudocode.
Open Source Code	No	The paper mentions using 'CVX (Grant and Boyd, 2014)' in the numerical validation section, which is a third-party tool. There is no explicit statement about releasing the authors' own implementation code, nor any links to a code repository.
Open Datasets	Yes	Besides, we also conduct this experiment on a real-world dataset, i.e., the UCI ML Breast Cancer dataset with 569 samples and the dimension d = 30.
Dataset Splits	Yes	We set the number of training points to range from 10 to 1000 while the number of test points is held ﬁxed at 20. [...] We set 80% of samples used for training and 20% of samples for test. Here the number of training data ranges from 40 to 300, and the number of test data ranges from 10 to 75, accordingly.
Hardware Specification	No	The paper includes a section on 'Numerical Validation' but does not specify any hardware components such as GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies	No	We solve the convex program in Eq. (17) using CVX (Grant and Boyd, 2014). The paper mentions CVX but does not provide a version number for it or any other software used.
Experiment Setup	Yes	The regularization parameter is set to λ = 10 8 for both two methods, kernel ridge regression via the NTK and the path norm based algorithm. We set the number of training points to range from 10 to 1000 while the number of test points is held ﬁxed at 20. [...] We set 80% of samples used for training and 20% of samples for test. Here the number of training data ranges from 40 to 300, and the number of test data ranges from 10 to 75, accordingly.