reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning Joint Interventional Effects from Single-Variable Interventions in Additive Models

Authors: Armin Kekić, Sergio Hernan Garrido Mejia, Bernhard Schölkopf

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on synthetic data demonstrate that our method achieves performance comparable to models trained directly on joint interventional data, outperforming a purely observational estimator. We compare three approaches: (i) Our Intervention Generalization method, training the estimator (17) on observational and single-intervention data (Section 6). (ii) An estimator trained directly on joint interventional data (topline). (iii) A naive estimator trained on pooled dataset of all observational and single-interventional data. (iv) An estimator trained solely on observational data.
Researcher Affiliation	Collaboration	1Max Planck Institute for Intelligent Systems, T ubingen, Germany 2Amazon Research, T ubingen, Germany 3T ubingen AI Center, T ubingen, Germany 4ELLIS Institute, T ubingen, Germany.
Pseudocode	No	The paper describes the methodology in prose and mathematical notation within the main text and appendix sections, without presenting any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	The code used for the experiments is available at github.com/akekic/intervention-generalization.
Open Datasets	No	Synthetic Data-Generating Process. We sample a structural causal model with five actions and confounders and causal relationships as shown in Figure 3. The structural assignments are second order polynomials with randomly sampled coefficients. The exogenous noises are Gaussian, uniform or logistic. The corresponding parameters are drawn at random before each experiment run.
Dataset Splits	Yes	We split each dataset into 80% training and 20% test data.
Hardware Specification	No	The paper does not specify any particular hardware components such as GPU models, CPU types, or memory used for running the experiments. It only mentions the number of data points for various experimental runs.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the implementation of the experiments.
Experiment Setup	Yes	We train third order polynomial estimator functions (17) as outlined in Section 6. We regularize the estimators using Ridge regression and find the optimal regularization parameter through 3-fold cross validation for each estimator. In order to satisfy Assumption 1 on the support of the interventions, we sample single interventions and joint interventions on the action variables to match the observational distributions. That is, we sample intervention values from a normal distribution: Aint k ~ N(µˆk, σˆ2 k) where µˆk and σˆk are the empirical mean and standard deviation of Ak in the observational data, respectively. For joint interventions, the intervention values are sampled independently for each action variable, following the same distribution as in the single intervention case.