reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Functional Linear Regression with Mixed Predictors

Authors: Daren Wang, Zifeng Zhao, Yi Yu, Rebecca Willett

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies and real data applications illustrate the promising performance of the proposed approach compared to the state-of-the-art methods in the literature. Numerical results: In this section, we conduct extensive numerical experiments to investigate the performance of the proposed RKHS-based penalized estimator (hereafter RKHS) for the functional linear regression with mixed predictors. Sections 5.1-5.3 compare RKHS with popular methods in the literature via simulation studies. Section 5.4 presents a real data application on crowdfunding prediction to further illustrate the potential utility of the proposed method.
Researcher Affiliation	Academia	Daren Wang EMAIL Department of ACMS University of Notre Dame Indiana, USA; Zifeng Zhao EMAIL Mendoza College of Business University of Notre Dame Indiana, USA; Yi Yu EMAIL Department of Statistics University of Warwick Coventry, UK; Rebecca Willett EMAIL Department of Statistics University of Chicago Illinois, USA
Pseudocode	Yes	Algorithm 1 Iterative coordinate descent 1: input: Observations {Xt(si), Zt, Yt(rj)}T,n1,n2 t=1,i=1,j=1, tuning parameters (λ1, λ2, λ3), the maximum iteration Lmax and tolerance ϵ. 2: initialization: L = 1, B0 = R0 = 0. 3: repeat First level block coordinate descent 4: Given B = BL 1, update RL via the ridge regression formulation (16). 5: Given R = RL, set e Y = Y 1/n1K 1RK 2X and initialize H = K 1BL 1. 6: repeat Second level coordinate descent 7: for l = 1, 2, , p do 8: Given {hj, j = l}, set e Y l t = e Yt P j =l Ztjhj, for t = 1, , T. 9: if min s 2 1 2 PT t=1 Ztl e Y l t λ3 ns 2 1 then 10: Update hl = 0. 12: repeat Third level coordinate descent 13: for k = 1, 2, , n2 do 14: Given {hlj, j = k}, update hlk via the one-dimensional optimization (19). 15: end for 16: until Decrease of function value (18) < ϵ. 18: end for 19: until Decrease of function value (17) < ϵ. 20: Update BL = K 1 1 H and set L L + 1. 21: until Decrease of function value (14) < ϵ or L Lmax. 22: output: b R = RL and b B = BL.
Open Source Code	Yes	The implementations of our numerical experiments can be found at https://github.com/darenwang/functional_regression.
Open Datasets	No	We consider a novel dataset collected from one of the largest crowdfunding websites, kickstarter.com, which provides an online platform for creators, e.g. start-ups, to launch fundraising campaigns for developing a new product such as electronic devices and card games.
Dataset Splits	Yes	Evaluation criteria: We evaluate the performance of the estimator by its excess risk. Speciﬁcally, given the sample size (n, T), we simulate observations {Xt(si), Zt, Yt(rj)}T+0.5T,n,n t=1,i=1,j=1, which are then split into the training data {Xt(si), Zt, Yt(rj)}T,n,n t=1,i=1,j=1 for constructing the estimator ( b A, bβ) and the test data {Xt(si), Zt, Yt(rj)}T+0.5T,n,n t=T+1,i=1,j=1 for the evaluation of the excess risk. ... A standard 5-fold crossvalidation (CV) on the training data is used to select the tuning parameters (λ1, λ2, λ3). ... To assess the out-of-sample performance of each method, we use a 2-fold CV, where we partition the 454 campaigns into two equal-sized sets and use one set to train the functional regression and the other to test the prediction performance, and then switch the role of the two sets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running the experiments. It only discusses software, datasets, and general computing environments.
Software Dependencies	No	The paper mentions software packages like 'R package fda.usc', 'R package refund', and 'Matlab package PACE (FPCreg function)', but it does not specify version numbers for these software dependencies, which is required for reproducible description.
Experiment Setup	Yes	Implementation details of the RKHS estimator: We set K = Kβ and use the rescaled Bernoulli polynomial as the reproducing kernel such that K(x, y) = 1 + k1(x)k1(y) + k2(x)k2(y) k4(x y), where k1(x) = x 0.5, k2(x) = 2 1{k2 1(x) 1/12}, k4(x) = 1/24{k4 1(x) k2 1(x)/2 + 7/240}, x [0, 1], and k4(x y) = k4(\|x y\|), x, y [0, 1]. Such K is the reproducing kernel for W 2,2. See Chapter 2.3.3 of Gu (2013) for more details. In Algorithm 1, we set the tolerance parameter ϵ = 10 8 and the maximum iterations Lmax = 104. A standard 5-fold crossvalidation (CV) on the training data is used to select the tuning parameters (λ1, λ2, λ3).