Revisiting inference after prediction
Authors: Keshav Motwani, Daniela Witten
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Sections 3 and 4, we investigate the empirical consequences of our findings from Section 2. These empirical investigations paint a clear picture: namely, that failure to target the correct parameter has substantial statistical consequences for the proposal of Wang et al. (2020), in the form of hypothesis tests that fail to control the Type 1 error, and confidence intervals that fail to attain the nominal coverage. The proposal of Angelopoulos et al. (2023) does not suffer these consequences, as it targets the correct parameter. We close with a discussion in Section 5. In this paper, we use capitals to represent a random variable and lower case to represent its realization. Vectors of length equal to the number of observations, or matrices whose rows correspond to the observations, are in bold. |
| Researcher Affiliation | Academia | Keshav Motwani EMAIL Department of Biostatistics University of Washington Seattle, WA Daniela Witten EMAIL Departments of Biostatistics and Statistics University of Washington Seattle, WA |
| Pseudocode | Yes | Algorithm 1 Bootstrap correction of Wang et al. (2020). The goal is to conduct inference on the association between Y and X. X 1. Use (ylab, ˆf(zlab)) to fit the relationship model Y | ˆf(Z) K( ˆf(Z), φ), yielding ˆφ. X 2. For b = 1, . . . , B: XX 2.1. Sample unlabeled observations with replacement to obtain zb unlab and xb unlab. XX 2.2. Sample outcomes yb| ˆf(zb unlab) from the relationship model K( ˆf(zb unlab), ˆφ). XX 2.3. Use ( yb, xb unlab) to fit a regression model for the relationship between Y and X, and record the coefficient estimate ˆβb and model-based standard error ˆsb. X 3. Compute the point estimate ˆβ = median{ˆβ1, . . . , ˆβB}. X 4. Compute the nonparametric standard error c SE(ˆβ) = SD{ˆβ1, . . . , ˆβB}. X 5. Compute the parametric standard error c SE(ˆβ) = median{ˆs1, . . . , ˆs B}. |
| Open Source Code | Yes | Code Availability Scripts to reproduce the results in this manuscript are available at https://github.com/keshav-motwani/Prediction Based Inference/. Our code is based on the code from Wang et al. (2020); we thank the authors for making it publicly accessible. |
| Open Datasets | No | We consider a simple simulation setting, inspired by the Simulated Data: Continuous case section of Wang et al. (2020). They generate three datasets: a training dataset consisting of realizations of (Z, X, Y ) used to train a machine learning model ˆf( ), a labeled dataset consisting of realizations of (Z, X, Y ), and an unlabeled dataset consisting only of realizations of (Z, X); both the labeled and unlabeled datasets are used for inference3. They consider predictors Z R4 and response Y R, and define the covariate X Z1. In Wang et al. (2020) s paper, the training, labeled, and unlabeled datasets each consist of 300 observations. Throughout this section, we keep the training sample size fixed at 300 observations, but vary the size of the labeled and unlabeled datasets. As in Wang et al. (2020), we generate the training, labeled, and unlabeled datasets from the same partially linear additive model Y = β0 + β1Z1 + P4 j=2 βjgj(Zj) + ϵ. Explanation: The paper describes a simulation study where the authors generate their own datasets (training, labeled, unlabeled) based on a partially linear additive model. It does not use or provide access to any external, publicly available datasets. |
| Dataset Splits | Yes | They generate three datasets: a training dataset consisting of realizations of (Z, X, Y ) used to train a machine learning model ˆf( ), a labeled dataset consisting of realizations of (Z, X, Y ), and an unlabeled dataset consisting only of realizations of (Z, X); both the labeled and unlabeled datasets are used for inference3. They consider predictors Z R4 and response Y R, and define the covariate X Z1. In Wang et al. (2020) s paper, the training, labeled, and unlabeled datasets each consist of 300 observations. Throughout this section, we keep the training sample size fixed at 300 observations, but vary the size of the labeled and unlabeled datasets. As in Wang et al. (2020), we generate the training, labeled, and unlabeled datasets from the same partially linear additive model Y = β0 + β1Z1 + P4 j=2 βjgj(Zj) + ϵ. In each replicate of the simulation study, we generate a new labeled and unlabeled dataset as described above. We perform a total of 1,000 simulation replicates. We consider two settings: one under the null (β 1 = 0) and one under the alternative (β 1 = 1). Figures 1, 2, 3, 4 show various sample sizes for nlab and nunlab, e.g., "nlab = 100 nunlab = 1000", "nlab = 0.1nunlab". |
| Hardware Specification | No | Explanation: The paper does not explicitly mention any specific hardware used for running its experiments, such as GPU models, CPU models, or cloud computing specifications. |
| Software Dependencies | No | We generate 3 training sets and fit a GAM to each training set, to obtain three fitted models ˆf1, ˆf2, ˆf3. Explanation: The paper mentions using |
| Experiment Setup | Yes | We consider a simple simulation setting, inspired by the Simulated Data: Continuous case section of Wang et al. (2020). They generate three datasets: a training dataset consisting of realizations of (Z, X, Y ) used to train a machine learning model ˆf( ), a labeled dataset consisting of realizations of (Z, X, Y ), and an unlabeled dataset consisting only of realizations of (Z, X); both the labeled and unlabeled datasets are used for inference3. They consider predictors Z R4 and response Y R, and define the covariate X Z1. In Wang et al. (2020) s paper, the training, labeled, and unlabeled datasets each consist of 300 observations. Throughout this section, we keep the training sample size fixed at 300 observations, but vary the size of the labeled and unlabeled datasets. As in Wang et al. (2020), we generate the training, labeled, and unlabeled datasets from the same partially linear additive model Y = β0 + β1Z1 + P4 j=2 βjgj(Zj) + ϵ. We consider two settings: one under the null (β 1 = 0) and one under the alternative (β 1 = 1). We generate 3 training sets and fit a GAM to each training set, to obtain three fitted models ˆf1, ˆf2, ˆf3. In each replicate of the simulation study, we generate a new labeled and unlabeled dataset as described above. We perform a total of 1,000 simulation replicates. |