reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Parameter-Transfer Learning by Semiparametric Model Averaging

Authors: Xiaonan Hu, Xinyu Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive numerical results demonstrate the superiority of the proposed method over competitive methods. ... In Section 4, we evaluate the finite sample performance of our procedure in various numerical experiments. ... In Section 5, we apply our approach to analyze housing rental information data in Beijing.
Researcher Affiliation	Academia	Xiaonan Hu EMAIL School of Mathematical Sciences Capital Normal University Beijing, 100048, China Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China Xinyu Zhang EMAIL Academy of Mathematics and Systems Science Chinese Academy of Sciences Beijing, 100190, China International Institute of Finance, School of Management University of Science and Technology of China Hefei, 230026, Anhui, China
Pseudocode	Yes	Algorithm 1: Trans-SMAP Input: Training samples, including the target and source data, {(x(m) i , z(m) i , y(m) i ); i = 1, . . . , nm, m = 0, . . . , M} from the target and source models (1) and the new sample {x(0) n0+1, z(0) n0+1} from the target model. Output: Prediction of y(0) n0+1 associated with the new sample {x(0) n0+1, z(0) n0+1}.
Open Source Code	No	The paper does not provide an explicit statement from the authors about releasing their own source code for the methodology described, nor does it include a direct link to a code repository. It mentions that "All experiments are implemented in R software" (Section 4.1) and refers to third-party R packages used (Appendix B.1), but this is not the authors' implementation code.
Open Datasets	Yes	In this section, we apply our approach to analyze housing rental information data in Beijing, which is drawn from a publicly available data set on http://www.idatascience.cn/dataset.
Dataset Splits	Yes	To determine a proper choice of weights, we adopt a J-fold (J > 1) cross-validation criterion. Specifically, we randomly divide the target samples into J mutually exclusive subgroups G1, . . . , GJ. ... Remark 2 The choice of J in criterion (4) is usually uncertain in practice. Since there is no theoretically optimal value, we manually use the 5-fold CV criterion in terms of computational efficiency in this paper. ... To evaluate the out-of-sample prediction risk, we randomly split the target samples into two subgroups with equal size as the training and testing data.
Hardware Specification	Yes	The numerical computation executes on a regular PC with an intel core i7-10700 2.90 GHz CPU.
Software Dependencies	No	The paper mentions that "All experiments are implemented in R software" (Section 4.1) and refers to the `quadprog` R package (Appendix B.1), but specific version numbers for R or any packages are not provided.
Experiment Setup	Yes	Set the target sample size n0 = 150, and source sample sizes (n1, n2, n3) = (200, 200, 150). For the parametric components, x(m) i from the target and source models are generated from a 6-dimensional multivariate normal distribution N(0, Σ) with Σ = [Σaa ]6 6, where Σaa = 0.5\|a a \|. Set the parametric coefficient vectors of the target and source models as β(0) = (1.4, 1.2, 1, 0.8, 0.65, 0.3)T , β(1) = (1.4, 1.2, 1, 0.8, 0.65, 0.3, 1.8)T + δ1, β(2) = (1.4, 1.2, 1, 0.8, 0.65, 0.3)T + δ2, and β(3) = (1.4, 1.2, 1, 0.8, 0.65, 0.3)T + δ3, where δ1 = 0.02, δ2 = 0.3, and δ3 = 0, so the parameters of the first and second source models are different from the target model, and the third source model is informative because its coefficient is exactly same as that of the target model.