reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Sparse Tensor Additive Regression

Authors: Botao Hao, Boxiang Wang, Pengyuan Wang, Jingfei Zhang, Jian Yang, Will Wei Sun

JMLR 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efficacy of STAR through extensive comparative simulation studies, and an application to the click-through-rate prediction in online advertising.
Researcher Affiliation	Collaboration	Botao Hao EMAIL Deepmind 5 New Street, London, UK Boxiang Wang EMAIL Department of Statistics and Actuarial Science The University of Iowa Iowa City, IA 52242, USA Pengyuan Wang EMAIL Department of Marketing University of Georgia Athens, GA 30602, USA Jingfei Zhang EMAIL Department of Management Science University of Miami Coral Gables, FL 33146, USA Jian Yang EMAIL Yahoo Research Verizon Media Sunnyvale, CA 94089, USA Will Wei Sun EMAIL Krannert School of Management Purdue University West Lafayette, IN 47907, USA
Pseudocode	Yes	Algorithm 1 Penalized Alternating Minimization for Solving (8) 1: Input: {yi}n i=1, {Xi}n i=1, initialization {b(0) 1 , . . . , b(0) m }, the set of penalization parameters {λ1n, . . . , λmn}, rank R, iteration t = 0, stopping error ϵ = 10 5. 2: Repeat t = t + 1 and run penalized alternating minimization. 3: For k = 1 to m b(t+1) k = argmin bk L(b(t) 1 , . . . , b(t) m ) + λkn P(b(t) k ), (11) where L is deﬁned in (10). 4: End for. 5: Until maxk b(t+1) k b(t) k 2 ϵ , and let t = T . 6: Output: the estimate of each component, {b(T ) 1 , . . . , b(T ) m }.
Open Source Code	No	No explicit statement or link to the authors' source code for the methodology described in the paper was found.
Open Datasets	No	The reported data and results in this section are deliberately incomplete and subject to anonymization, and thus do not necessarily reﬂect the real portfolio at any particular time.
Dataset Splits	Yes	For both STAR and TLR, ﬁve-fold cross-validation is employed to select the best pair of the tuning parameters R and λ We train and tune each method on the data obtained on the ﬁrst 24 days, and use the remaining data as the test data to assess the prediction accuracy.
Hardware Specification	Yes	The experiment was conducted using a single processor Inter(R) Xeon(R) CPU E5-2600@2.60GHz.
Software Dependencies	No	The paper mentions "R package glmnet (Friedman et al., 2010)" but does not provide a specific version number for this or any other software dependency.
Experiment Setup	Yes	natural cubic splines with B-spline basis are used in STAR with the degree ﬁxed to be ﬁve, which amounts to having four inner knots. For both STAR and TLR, ﬁve-fold cross-validation is employed to select the best pair of the tuning parameters R and λ, where the tensor rank R is chosen from {2, 3} and λ is selected from a sequence that is uniformly distributed on the logarithm scale in an interval [10 5, 1]. For GP and AMP, as suggested by Kanagawa et al. (2016), the Gaussian kernel is used and the bandwidth is set to be 100; ﬁve-fold cross-validation is used to select λ, where λ is selected from the same range that is used for TLR and STAR.