reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Counterfactual Learning with Multioutput Deep Kernels

Authors: Alberto Caron, Ioanna Manolopoulou, Gianluca Baio

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the use of the proposed methods on simulated experiments that span individual causal effects estimation, off-policy evaluation and optimization. In the first part of the work, we rely on Structural Causal Models (SCM) to formally introduce the setup and the problem of identifying counterfactual quantities under observed confounding. We then discuss the benefits of tackling the task of causal effects estimation via stacked coregionalized Gaussian Processes and Deep Kernels. Finally, we demonstrate the use of the proposed methods on simulated experiments that span individual causal effects estimation, off-policy evaluation and optimization. We evaluate the performance of counterfactual GPs and counterfactual DKL on a data generating process with three different tasks, and on a real-world example combining experimental and observational data.
Researcher Affiliation	Academia	Alberto Caron EMAIL Department of Statistical Science, University College London The Alan Turing Institute, London, UK Gianluca Baio EMAIL Department of Statistical Science, University College London Ioanna Manolopoulou EMAIL Department of Statistical Science, University College London
Pseudocode	No	The paper describes methods and architectures (e.g., Figure 3), but it does not contain any clearly labeled pseudocode or algorithm blocks with structured steps.
Open Source Code	Yes	We demonstrate the use of Counter DKL with simulated experiments on causal effects estimation, off-policy evaluation (OPE) and learning off-policy (OPL) problems (Dudík et al., 2011; Dudík et al., 2014; Farajtabar et al., 2018; Kallus, 2021), by providing also an Python implementation of the models, based on GPy Torch1. 1 Full code at: https://github.com/albicaron/Counter DKL
Open Datasets	Yes	We demonstrate the efficiency of Counter DKL also on a second experiment taken from Shalit et al. (2017), involving a popular real-world study on a job training program, dating back to La Londe (1986). Finally we compare Counter DKL with few other recent methods for causal effects estimation, on a popular simulated experiment utilizing the Infant Health Development Program (IHDP) data, originally found in Hill (2011), and more recently in several contributions on Conditional Average Treatment Effects (CATE) estimation. We make use of some of the popular datasets for classification in the open-source UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/index.php)
Dataset Splits	Yes	ICE: The first is the prediction of Individual Causal Effects (ICE). This tackles the estimation of the average causal effect of playing action Ai = a on outcome Yi, given a certain realization of the covariates space, Xi = x, i.e. the estimation of ICE: E(Yi\|do(Ai = a), Xi = xi). This is carried out using a 80% training set, and evaluated via RMSE on a 20% left-out test set. Results on performance are gathered in Table 1, in terms of 70%-30% train and test set Mean Absolute Error (MAE) on ATT, Policy Risk Rpol and average runtime, accompanied by 10-fold cross-validated 95% error intervals. Results reported in Table 2 refers to 1000 replication of the experiment on 80%-20% train-test split as in Alaa & van der Schaar (2017).
Hardware Specification	Yes	All experiments were run on a Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz, 8Gb RAM CPU.
Software Dependencies	No	The paper mentions 'Python implementation', 'GPy Torch', and 'Adam solver'. However, it does not provide specific version numbers for these software components in the main text to ensure reproducibility.
Experiment Setup	Yes	DKL models employed a three [50, 50, 2] hidden layers feedforward neural network before the GP -layer, which itself employs a RBF base kernel. The multitask and multioutput models (both GPs and DKLs) all make use of the Intrinsic Coregionalization Model (ICM), such that K(xi, x i) = BY BA Kq(xi, x i). All model were optimized through the Adam solver. The Auto Encoder deep structure employed for the Auto Enc + GP" and Auto Enc + Counter GP" models similarly learns a 2-dimensional encoded lower-dimensional representation, where the encoder has two [10, 5] hidden layers before the 2-dim representation and the decoder has [5, 10] hidden layers before the reconstruction loss. our Counterfactual DKL (Counter DKL) with [100, 100, 2] hidden layers.