Causal Classification: Treatment Effect Estimation vs. Outcome Prediction

Authors: Carlos Fernández-Loría, Foster Provost

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The theoretical results, as well as simulations, illustrate settings where outcome prediction should actually be better, including cases where (1) the bias may be partially corrected by choosing a different threshold, (2) outcomes and treatment effects are correlated, and (3) data to estimate counterfactuals are limited. A major practical implication is that, for some applications, it might be feasible to make good intervention decisions without any data on how individuals actually behave when intervened. Finally, we show that for a real online advertising application, outcome prediction models indeed excel at causal classification.
Researcher Affiliation Academia Carlos Fern andez-Lor ıa EMAIL HKUST Business School Hong Kong University of Science and Technology Hong Kong Foster Provost EMAIL Stern School of Business New York University New York, NY, USA
Pseudocode No The paper includes Python code in Appendix C but it is actual code, not pseudocode or a clearly labeled algorithm block.
Open Source Code Yes Appendix C. Simulator Code We present here the Python code we used to generate the data in Section 6.
Open Datasets Yes We use data made available by Criteo (an advertising platform) based on randomly targeting advertising to a large sample of users (Diemert Eustache, Betlei Artem et al., 2018). ... See https://ailab.criteo.com/criteo-uplift-prediction-dataset/ for details and access to the data. We use the version of the data set without leakage.
Dataset Splits Yes The models were trained and tuned with cross-validation using 80% of the sample (the training set). The targeting approaches were evaluated using the remaining 20% of the sample (the test set).
Hardware Specification No The paper does not specify any particular hardware used for running the simulations or the real-world example in Appendix D.
Software Dependencies No The Python code in Appendix C imports the 'numpy' library ('import numpy as np'), but no specific version numbers for Python or NumPy are provided.
Experiment Setup No For the simulations, 'Table 4 shows the default values used for the simulation parameters.' These are parameters for the simulator, not explicit hyperparameters for a machine learning model. For the practical example, 'All the approaches were implemented using decision tree models.' but no specific hyperparameters for these decision tree models are provided.