reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiple Descent in the Multiple Random Feature Model

Authors: Xuran Meng, Jianfeng Yao, Yuan Cao

JMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then provide a thorough experimental study to verify our theory. At last, we extend our study to the multiple random feature model (MRFM), and show that MRFMs ensembling K types of random features may exhibit (K + 1)-fold descent. Our analysis points out that risk curves with a speciﬁc number of descent generally exist in learning multi-component prediction models. The curve gives our theoretical predictions, and the dots are our numerical results. Figure 3: Examples of double and triple descent. (a) gives the excess risk of a random feature model with Re LU activation function; (b) shows the excess risk of a double random feature model with Re LU and sigmoid activation functions; (c) shows the excess risk of a double random feature model with ELU and Re LU activation functions. The x-axis is the model complexity (number of parameters/sample size) and the y-axis is the excess risk. The curve gives our theoretical predictions, and the dots are our numerical results.
Researcher Affiliation	Academia	Xuran Meng EMAIL Department of Statistics and Actuarial Science The University of Hong Kong Jianfeng Yao EMAIL School of Data Science The Chinese University of Hong Kong (Shenzhen) Yuan Cao EMAIL Department of Statistics and Actuarial Science The University of Hong Kong
Pseudocode	No	The paper describes methods and derivations mathematically and textually, but does not include any clearly labeled "Pseudocode" or "Algorithm" blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Meng, Xuran, Jianfeng Yao, and Yuan Cao. Online supplementary material to multiple descent in the multiple random feature model . URL https://github.com/Xuran Meng/Multipledescent/blob/main/onlinesupplementary.pdf.
Open Datasets	No	The distribution of the data pair (x, y) is given as follows: 1. The input vector x follows the uniform distribution on the sphere d Sd 1 of raidus . 2. The output is y = β1,d, x +F0+ε, where β1,d Rd, F0 R, and ε is a noise independent of x. We assume that E(ε) = 0, E(ε2) = τ 2, and E(ε4) < + . The parameters of the data generation model are βd = [F0, β 1,d] and we hereafter denote by D(βd) the probability distribution of the pair (x, y). This data generation model is standard in recent literature on double descent. Similar settings have been studied in a number of recent works (Hamsici and Martinez, 2007; Marinucci and Peccati, 2011; Di Marzio et al., 2014; Mei and Montanari, 2022).
Dataset Splits	Yes	Given a training data set S = {(xi, yi)}n i=1 consisting of n independent samples from the data generation model in Deﬁnition 2.1... Training data {(xi, yi)}n i=1 are generated independently following Deﬁnition 2.1 with τ = 0.1: each xi is uniformly generated from the sphere d Sd 1, and the corresponding response is given as yi = β1, xi + F0 + εi, where β1 is a randomly chosen unit vector; F0 = 0.2, λ = 10 5; Training sample size n = 1000, data dimension d = 300 and N1 = N2 varying from 0 to 1.6n. As we gradually increase the dimensions of random features N1 = N2 from 0 to 1.6n, the model complexity parameter c(d) = (N1 + N2)/n varies from 0 to 3.2. The empirical and ﬁnite-horizon values for the limiting excess risk R(λ, ψ, µ, F1, τ) in Theorem 3.6 are obtained on a test data set of size 700 and averaged from 30 independent replications.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency details, such as library names with version numbers (e.g., Python 3.8, PyTorch 1.9, etc.), or specific solvers with versions.
Experiment Setup	Yes	Training data {(xi, yi)}n i=1 are generated independently following Deﬁnition 2.1 with τ = 0.1: each xi is uniformly generated from the sphere d Sd 1, and the corresponding response is given as yi = β1, xi + F0 + εi, where β1 is a randomly chosen unit vector; F0 = 0.2, λ = 10 5; Training sample size n = 1000, data dimension d = 300 and N1 = N2 varying from 0 to 1.6n. The experiment setups are the same as the experiments in Section 4.2, except that here we use diﬀerent pairs of activation functions. For two activation functions σ1, σ2, we gradually decrease the scale of σ2 by using activation pairs (σ1(x), c0σ2(x)) with a smaller and smaller factor c0. The experimental setting is similar to the previous experiments reported in Section 4. We set d = 300, n = 1000, and λ = 10 4. In simulation, the training data {(xi, yi)}n i=1 are generated independently according to Deﬁnition 2.1: each xi is uniformly generated from the sphere d Sd 1, and the corresponding response is given as yi = β1, xi +F0+εi, where β1 is a randomly chosen unit vector, F0 = 0.2 and τ = 0.1. We consider two MRFMs with K = 3 and K = 4, respectively. For the case K = 3, we consider three activation functions σ1(x) = Re LU(9x), σ2(x) = Re LU(x) and σ3(x) = Re LU(0.1x), and set the ratios between dimensions of random features as N1 = N2 = N3/3. For the case K = 4, we use four activation functions σ1(x) = Re LU(80x), σ2(x) = Re LU(9x), σ3(x) = Re LU(x) and σ4(x) = Re LU(0.1x), and keep the ratios N1 = N2 = N3 = N4/3.