Do causal predictors generalize better to new domains?
Authors: Vivian Nastl, Moritz Hardt
NeurIPS 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study how well machine learning models trained on causal features generalize across domains. We consider 16 prediction tasks on tabular datasets...allowing us to test how well a model trained in one domain performs in another. |
| Researcher Affiliation | Academia | Vivian Y. Nastl Max Planck Institute for Intelligent Systems, Tübingen, Germany and Tübingen AI Center Max Planck ETH Center for Learning Systems EMAIL Moritz Hardt Max Planck Institute for Intelligent Systems, Tübingen, Germany and Tübingen AI Center EMAIL |
| Pseudocode | No | The paper describes experimental procedures and methods in paragraph text and figures, but it does not include formal pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Our code is based on Gardner et al. [2023], Hardt and Kim [2023] and Gulrajani and Lopez-Paz [2020]. It is available at https://github.com/socialfoundations/causal-features. |
| Open Datasets | Yes | We consider 16 prediction tasks on tabular datasets from prior work [Ding et al., 2021, Hardt and Kim, 2023, Gardner et al., 2023]...Table 1: Description of tasks, data sources and number of features in each selection. |
| Dataset Splits | Yes | We have a train/test/validation split within the in-domain set, and a test/validation split within the out-of-domain set. |
| Hardware Specification | Yes | Each job was given the same computing resources: 1 CPU. Compute nodes use AMD EPYC 7662 64-core CPUs. Memory was allocated as required for each task: all jobs were allocated at least 128GB of RAM; for the tasks Public Coverage jobs were allocated 384GB of RAM. |
| Software Dependencies | No | The paper mentions several software components and libraries, such as 'Hyper Opt [Bergstra et al., 2013]' and machine learning algorithms (XGBoost, Light GBM, IRM, REx, etc.), but it does not specify their version numbers. |
| Experiment Setup | Yes | We conduct a hyperparameter sweep using Hyper Opt [Bergstra et al., 2013] on the in-domain validation data. A method is tuned for 50 trials. We exclusively train on the training set. |