reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models

Authors: Molei Liu, Yi Zhang, Katherine P. Liao, Tianxi Cai

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation studies demonstrate that our method is more robust and efficient than existing methods under various configurations. We also examine the utility of our method through a real transfer learning example of the phenotyping algorithm for rheumatoid arthritis across different time windows. Performance of the four approaches is evaluated through root mean square error, bias, and coverage probability of the 95% confidence interval in terms of estimating and inferring β0, β1, β2, β3, as summarized in Tables A2 A5 of Appendix D for configurations (i) (iv) respectively. The mean square error and absolute bias averaged over the target parameters, and the maximum deviance of the coverage probability from the nominal level 0.95 among all parameters are summarized in Table 1.
Researcher Affiliation	Academia	Molei Liu EMAIL Department of Biostatistics Columbia Mailman School of Public Health New York, NY 10032, USA; Yi Zhang EMAIL Department of Statistics Harvard University Cambridge, MA 02138, USA; Katherine P Liao EMAIL Department of Medicine Rheumatology, Immunology Brigham and Women s Hospital Boston, MA 02115, USA; Tianxi Cai EMAIL Department of Biostatistics Harvard Chan School of Public Health Boston, MA 02115, USA
Pseudocode	No	The paper describes mathematical equations and estimation procedures in narrative text and formulas (e.g., equations 9, 10, 11, 13) within sections like '2.3 Estimation Procedure for bβATRe L'. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, step-by-step instructions in a code-like format.
Open Source Code	No	The paper includes a license statement: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/22-0700.html.' This refers to the paper's license and attribution, not the release of source code for the methodology described in the paper. There is no explicit statement indicating the release of code or a link to a code repository.
Open Datasets	No	The paper mentions a 'real transfer learning example of the phenotyping algorithm for rheumatoid arthritis across different time windows' and refers to 'EHR data' from 'Mass General Brigham (MGB)'. It states: 'There are a total of 200 labeled patients with true RA status, Y, manually annotated via chart review.' This implies a specific, likely internal or restricted-access, dataset. No concrete access information (link, DOI, repository, or formal citation with authors/year for public access) is provided for this dataset.
Dataset Splits	Yes	Speciﬁcally, we randomly split the source samples into K equal sized disjoint sets, indexed by I1, . . . , IK, with {1, ..., n} = K k=1Ik and denote I-k = {1, .., n} \ Ik. ... We use cross-ﬁtting with K = 5 folds for our method and the two DML estimators.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running its experiments or simulations.
Software Dependencies	No	The paper describes various statistical and machine learning methods (e.g., logistic regression, kernel smoothing, sieve estimation, machine learning algorithms like lasso, random forest, and neural networks) but does not provide specific software names with version numbers for their implementation. For example, it does not mention 'Python 3.x' or 'PyTorch 1.x'.
Experiment Setup	Yes	We set the loading vector c as (1, 0, 0, 0)T, (0, 1, 0, 0)T, (0, 0, 1, 0)T, and (0, 0, 0, 1)T to estimate β0, β1, β2, β3 separately. ... we add ridge penalty tuned by cross-validation with tuning parameter of order n 2/3 (below the parametric rate) to enhance the training stability. ... all the tuning parameters including the bandwidth of our method and kernel machine and the coeﬃcients of the penalty functions are selected by 5-folded cross-validation on the training samples.