Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models

Authors: Molei Liu, Yi Zhang, Katherine P. Liao, Tianxi Cai

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulation studies demonstrate that our method is more robust and efficient than existing methods under various configurations. We also examine the utility of our method through a real transfer learning example of the phenotyping algorithm for rheumatoid arthritis across different time windows. Performance of the four approaches is evaluated through root mean square error, bias, and coverage probability of the 95% confidence interval in terms of estimating and inferring β0, β1, β2, β3, as summarized in Tables A2 A5 of Appendix D for configurations (i) (iv) respectively. The mean square error and absolute bias averaged over the target parameters, and the maximum deviance of the coverage probability from the nominal level 0.95 among all parameters are summarized in Table 1.
Researcher Affiliation Academia Molei Liu EMAIL Department of Biostatistics Columbia Mailman School of Public Health New York, NY 10032, USA; Yi Zhang EMAIL Department of Statistics Harvard University Cambridge, MA 02138, USA; Katherine P Liao EMAIL Department of Medicine Rheumatology, Immunology Brigham and Women s Hospital Boston, MA 02115, USA; Tianxi Cai EMAIL Department of Biostatistics Harvard Chan School of Public Health Boston, MA 02115, USA
Pseudocode No The paper describes mathematical equations and estimation procedures in narrative text and formulas (e.g., equations 9, 10, 11, 13) within sections like '2.3 Estimation Procedure for bβATRe L'. However, it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, step-by-step instructions in a code-like format.
Open Source Code No The paper includes a license statement: 'License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v24/22-0700.html.' This refers to the paper's license and attribution, not the release of source code for the methodology described in the paper. There is no explicit statement indicating the release of code or a link to a code repository.
Open Datasets No The paper mentions a 'real transfer learning example of the phenotyping algorithm for rheumatoid arthritis across different time windows' and refers to 'EHR data' from 'Mass General Brigham (MGB)'. It states: 'There are a total of 200 labeled patients with true RA status, Y, manually annotated via chart review.' This implies a specific, likely internal or restricted-access, dataset. No concrete access information (link, DOI, repository, or formal citation with authors/year for public access) is provided for this dataset.
Dataset Splits Yes Specifically, we randomly split the source samples into K equal sized disjoint sets, indexed by I1, . . . , IK, with {1, ..., n} = K k=1Ik and denote I-k = {1, .., n} \ Ik. ... We use cross-fitting with K = 5 folds for our method and the two DML estimators.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running its experiments or simulations.
Software Dependencies No The paper describes various statistical and machine learning methods (e.g., logistic regression, kernel smoothing, sieve estimation, machine learning algorithms like lasso, random forest, and neural networks) but does not provide specific software names with version numbers for their implementation. For example, it does not mention 'Python 3.x' or 'PyTorch 1.x'.
Experiment Setup Yes We set the loading vector c as (1, 0, 0, 0)T, (0, 1, 0, 0)T, (0, 0, 1, 0)T, and (0, 0, 0, 1)T to estimate β0, β1, β2, β3 separately. ... we add ridge penalty tuned by cross-validation with tuning parameter of order n 2/3 (below the parametric rate) to enhance the training stability. ... all the tuning parameters including the bandwidth of our method and kernel machine and the coefficients of the penalty functions are selected by 5-folded cross-validation on the training samples.