Optimization-based Causal Estimation from Heterogeneous Environments
Authors: Mingzhang Yin, Yixin Wang, David M. Blei
JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We describe the theoretical foundations of this approach and demonstrate its effectiveness on simulated and real datasets. Compared to classical ML and existing methods, Co Co provides more accurate estimates of the causal model and more accurate predictions under interventions. Keywords: Causal estimation, Robust prediction, Constrained optimization, Directional derivative, Interventional data |
| Researcher Affiliation | Academia | Mingzhang Yin EMAIL Warrington College of Business University of Florida Gainesville, FL, 32611, USA Yixin Wang EMAIL Department of Statistics University of Michigan Ann Arbor, MI, 48109, USA David M. Blei EMAIL Department of Computer Science and Department of Statistics Columbia University New York, NY, 10027, USA |
| Pseudocode | Yes | Algorithm 1 Co Co with known exogenous variables input : Data De = {Ye, Xe}, Xe Rne p; the risk function Re for each environment e E; the set of known non-descendant variables C; the predictor f( ). output : Coefficient estimation α with causal interpretation. Initialize α randomly while not converged do for e in E do Compute the gradient of the empirical risk: i=1 Re(α; ye i , ˆye i ), ˆye i = f(xe i; α) Set α = α (1 1C) + 1C Compute the optimization objective: Le(α) = ge(α) α 2 end Update α α η e E Le(α) with step size η end |
| Open Source Code | Yes | Code implementations for the empirical studies are available at https://github. com/mingzhang-yin/Co Co. |
| Open Datasets | Yes | 7.3 Colored MNIST (CMNIST) CMNIST is a semi-synthetic data set for binary classification introduced in Arjovsky et al. (2019). Based on the MNIST data set... 7.4 Natural image classification In this example, following Cloudera (2020), we adapt the i Wild Cam 2019 dataset (Beery et al., 2019) that contains wildlife images taken in the wild. |
| Dataset Splits | Yes | We generate two environments with γe {0.5, 2.0}, each environment with 10,000 data points. As required, the DGPs leave the causal coefficient invariant. ... For the training with pe {0.1, 0.2}, for the validation with pe = 0.5 and for the testing with pe = 0.9. ... Based on the setting of Cloudera (2020), we use images from two locations as the training data and images from another location as the test data. We use images from an additional location as the validation data. |
| Hardware Specification | No | No specific hardware details (like GPU/CPU models or cloud instance types) were mentioned for running the experiments. The paper only mentions using ResNet18 features, which is a model, not hardware. |
| Software Dependencies | No | No specific software versions (e.g., Python 3.x, PyTorch 1.x) were mentioned. The paper mentions using 'Adam optimizer' and referring to an 'IRM implementation' but without version numbers for these or other core libraries. |
| Experiment Setup | Yes | For the algorithms with tuning parameter λ, we report the best result for IRM with λ {2, 20, 200}, for V-REx and RVP with λ {10, 102, 103, 104}. We choose stepsize from {0.01, 0.1} that produces the lowest objective for each method. For all methods, the algorithm is considered to converge if the mean absolute difference between the parameters in consecutive iterations is less than 10 3 and the total iterations are over 104. ... For both Co Co and IRM, the penalty weight is chosen from ten values equally spaced from 1 to 100 on a log-scale using the validation environments. The weight on the empirical risk term is reduced to 0 after 5k iterations. ... We use Adam optimizer (Kingma and Ba, 2014) with learning rate 10 4. ... We set the weak condition weight λw = 10 4 and the risk regularizer weight λr = 1. λr is reduced to 10 5 after 100 epochs. The risk regularizer is an inductive bias to encourage nonzero solutions. After the optimizer is sufficiently away from the zero point, annealing the risk regularizer prevents the algorithm from minimizing the objective by reducing the risk function, hence preventing it from using the spurious association. Co Co is compared with ERM, IRM (Arjovsky et al., 2019), and V-REx (Krueger et al., 2020). All methods are trained by ADAM with a learning rate 10 3. |