Model-agnostic meta-learners for estimating heterogeneous treatment effects over time

Authors: Dennis Frauen, Konstantin Hess, Stefan Feuerriegel

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our IVW-DR-learner achieves superior performance in our experiments, particularly in regimes with low overlap and long time horizons. ... In this section, we compare our proposed meta-learners empirically. ... We simulate three datasets Dj with j {1, 2, 3} from different data-generating processes. ... Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016).
Researcher Affiliation Academia Dennis Frauen , Konstantin Hess & Stefan Feuerriegel LMU Munich Munich Center of Machine Learning (MCML) EMAIL
Pseudocode No The paper describes the methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/DennisFrauen/CATEMetaLearnersTime.
Open Datasets Yes Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016).
Dataset Splits Yes We sample a training dataset of size ntrain = 5000 and a test dataset of size ntest = 1000. ... We sample a training dataset of size ntrain = 10000 and a test dataset of size ntest = 1000.
Hardware Specification Yes For each transformer-based learner, training took approximately 90 seconds using n = 5000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM.
Software Dependencies No The paper mentions using a transformer-based architecture (Ashish Vaswani et al., 2017) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for any software libraries or programming languages.
Experiment Setup Yes Further details regarding architecture, training, hyperparameters, and runtime are in Appendix E. ... Each block consists of (i) a self-attention mechanism with three attention heads and hidden state dimension dmodel = 30, (ii) and a feed-forward network with hidden layer size dff = 20. Both the (i) self-attention mechanism and (ii) the feed-forward network employ residual connections, which are followed by dropout layers with dropout probabilities p = 0.1, respectively. ... We employ additional weight decay for the two-stage learners to avoid overfitting during the pseudo-outcome regression.