Causal Classification: Treatment Effect Estimation vs. Outcome Prediction
Authors: Carlos Fernández-Loría, Foster Provost
JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The theoretical results, as well as simulations, illustrate settings where outcome prediction should actually be better, including cases where (1) the bias may be partially corrected by choosing a different threshold, (2) outcomes and treatment effects are correlated, and (3) data to estimate counterfactuals are limited. A major practical implication is that, for some applications, it might be feasible to make good intervention decisions without any data on how individuals actually behave when intervened. Finally, we show that for a real online advertising application, outcome prediction models indeed excel at causal classification. |
| Researcher Affiliation | Academia | Carlos Fern andez-Lor ıa EMAIL HKUST Business School Hong Kong University of Science and Technology Hong Kong Foster Provost EMAIL Stern School of Business New York University New York, NY, USA |
| Pseudocode | No | The paper includes Python code in Appendix C but it is actual code, not pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | Appendix C. Simulator Code We present here the Python code we used to generate the data in Section 6. |
| Open Datasets | Yes | We use data made available by Criteo (an advertising platform) based on randomly targeting advertising to a large sample of users (Diemert Eustache, Betlei Artem et al., 2018). ... See https://ailab.criteo.com/criteo-uplift-prediction-dataset/ for details and access to the data. We use the version of the data set without leakage. |
| Dataset Splits | Yes | The models were trained and tuned with cross-validation using 80% of the sample (the training set). The targeting approaches were evaluated using the remaining 20% of the sample (the test set). |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the simulations or the real-world example in Appendix D. |
| Software Dependencies | No | The Python code in Appendix C imports the 'numpy' library ('import numpy as np'), but no specific version numbers for Python or NumPy are provided. |
| Experiment Setup | No | For the simulations, 'Table 4 shows the default values used for the simulation parameters.' These are parameters for the simulator, not explicit hyperparameters for a machine learning model. For the practical example, 'All the approaches were implemented using decision tree models.' but no specific hyperparameters for these decision tree models are provided. |