reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Adjustment for Confounding using Pre-Trained Representations

Authors: Rickmer Schulte, David Rügamer, Thomas Nagler

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following, we will complement our theoretical results from the previous section with empirical evidence from several experiments. The experiments include both images and text as non-tabular data, which act as the source of confounding in the ATE setting. Further experiments can be found in Appendix D.
Researcher Affiliation	Academia	1Department of Statistics, LMU Munich, Munich, Germany 2Munich Center for Machine Learning (MCML), Munich, Germany. Correspondence to: Rickmer Schulte <EMAIL>.
Pseudocode	No	The paper describes methods using mathematical formulations and natural language, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce the results of the experiments can be found at https://github.com/rickmer-schulte/Pretrained-Causal-Adjust.
Open Datasets	Yes	Text Data We utilize the IMDb Movie Reviews dataset from Lhoest et al. (2021) consisting of 50,000 movie reviews labeled for sentiment analysis. [...] Image Data We further use the dataset from Kermany et al. (2018) that contains 5,863 chest X-ray images of children.
Dataset Splits	Yes	Generally, DML is used with sample splitting and with two folds for cross-validation. [...] After preprocessing and extraction of pre-trained representations, we sub-sampled 1,000 and 4,000 pre-trained representations for the two confounding setups to make the simulation study tractable. [...] The following experiment is based on 500 sampled images from the X-Ray dataset, where five-layer CNNs are used in the non-pre-trained DML version. [...] DML without pre-training (DML (CNN)) for ATE estimation using 500 (Left) and all 3769 (Right) images from the X-Ray dataset.
Hardware Specification	Yes	All computations were performed on a user PC with Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz, 8 cores, and 16 GB RAM.
Software Dependencies	No	The paper mentions software components like 'scikit-learn', 'Causal ML (Chen et al., 2020)', 'Double ML (Bach et al., 2022)', and 'Adam for optimization', but it does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	In the Complex Confounding experiments, we also use neural network-based nuisance estimators for DML and the S-Learner. For this purpose, we employed neural networks with a depth of 100 and a width of 50 while using Re LU activation and Adam for optimization. [...] The experiment of Figure 7 uses a five-layer CNN with 3 × 3 convolutions, batch normalization, Re LU activation, and max pooling, followed by a model head consisting of fully connected layers with dropout. Training uses Adam optimization with early stopping.