AutoCATE: End-to-End, Automated Treatment Effect Estimation
Authors: Toon Vanderschueren, Tim Verdonck, Mihaela Van Der Schaar, Wouter Verbeke
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section analyzes Auto CATE s design choices per stage: evaluation (5.2), estimation (5.3), and ensembling (5.4). We identify best practices and benchmark the resulting configuration against common alternatives (5.5). Our experiments compare various automated, end-to-end strategies for learning a CATE estimation pipeline. Using Auto CATE, we can evaluate a range of design choices. To obtain general insights, we leverage a collection of standard benchmarks for CATE estimation: IHDP (Hill, 2011), ACIC (Dorie et al., 2019), News (Johansson et al., 2016), and Twins (Louizos et al., 2017); see Appendix C for details. These semi-synthetic benchmarks include 247 distinct data sets that vary in outcome (regression and classification), dimensionality, size, and application area, allowing for a comprehensive analysis Auto CATE. |
| Researcher Affiliation | Academia | 1KU Leuven 2University of Antwerp 3University of Cambridge. Correspondence to: Toon Vanderschueren <EMAIL>. |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. It describes the methods in narrative text and using figures like Figure 1 to illustrate the stages of Auto CATE. |
| Open Source Code | Yes | To facilitate broad adoption and further research, we release Auto CATE as an open-source software package. The software package and accompanying experimental code are publicly online at https://github.com/toonvds/Auto CATE. |
| Open Datasets | Yes | Our experiments compare various automated, end-to-end strategies for learning a CATE estimation pipeline. Using Auto CATE, we can evaluate a range of design choices. To obtain general insights, we leverage a collection of standard benchmarks for CATE estimation: IHDP (Hill, 2011), ACIC (Dorie et al., 2019), News (Johansson et al., 2016), and Twins (Louizos et al., 2017); see Appendix C for details. These semi-synthetic benchmarks include 247 distinct data sets that vary in outcome (regression and classification), dimensionality, size, and application area, allowing for a comprehensive analysis Auto CATE. |
| Dataset Splits | Yes | Figure 3 presents results for different holdout ratios, illustrating this trade-off and showing that a holdout ratio of 30-50% generally works well. We use 30% in the rest of this work. Although more folds in cross-validation often improve model performance in supervised settings, we do not observe this effect for Auto CATE (see Table 5), likely due to the interaction between the number of folds and the holdout ratio. Finally, we include a stratified training-validation split and a stratified k-fold cross-validation procedure. Following the experiments in the main body, we use a 70 30% train-test split. |
| Hardware Specification | Yes | These experiments were conducted locally, on a machine with an AMD Ryzen 7 PRO 4750U processor (1.70 GHz), 32 GB of RAM, and a 64-bit operating system. |
| Software Dependencies | No | Auto CATE is implemented in Python2, following scikit-learn s design principles (Pedregosa et al., 2011). Nevertheless, as the search is implemented with optuna (Akiba et al., 2019), we could use a range of optimizers. Where available, we use the Causal ML implementations (Chen et al., 2020). |
| Experiment Setup | Yes | Table 3: Preprocessor search spaces. We describe the search spaces for the different preprocessors. If a hyperparameter is not mentioned, we use its default. All preprocessors are implemented with scikit-learn (Pedregosa et al., 2011); we refer to their documentation for more information. Table 4: Baselearner search spaces. We describe the search spaces for each baselearner. If a hyperparameter is not mentioned, we use its default. All baselearners are implemented with scikit-learn (Pedregosa et al., 2011); we refer to their documentation for more information. While efficient optimization strategies such as Bayesian approaches could be used, we use random search throughout this work to focus on other design choices in Auto CATE. |