Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Authors: Dennis Frauen, Konstantin Hess, Stefan Feuerriegel
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our IVW-DR-learner achieves superior performance in our experiments, particularly in regimes with low overlap and long time horizons. ... In this section, we compare our proposed meta-learners empirically. ... We simulate three datasets Dj with j {1, 2, 3} from different data-generating processes. ... Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016). |
| Researcher Affiliation | Academia | Dennis Frauen , Konstantin Hess & Stefan Feuerriegel LMU Munich Munich Center of Machine Learning (MCML) EMAIL |
| Pseudocode | No | The paper describes the methods using mathematical formulations and descriptive text, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/DennisFrauen/CATEMetaLearnersTime. |
| Open Datasets | Yes | Real-world dataset. We sample n = 3000 patient trajectories electronic health records over up to T = 10 time points from the MIMIC III dataset (Johnson et al., 2016). |
| Dataset Splits | Yes | We sample a training dataset of size ntrain = 5000 and a test dataset of size ntest = 1000. ... We sample a training dataset of size ntrain = 10000 and a test dataset of size ntest = 1000. |
| Hardware Specification | Yes | For each transformer-based learner, training took approximately 90 seconds using n = 5000 samples and a standard computer with AMD Ryzen 7 Pro CPU and 32GB of RAM. |
| Software Dependencies | No | The paper mentions using a transformer-based architecture (Ashish Vaswani et al., 2017) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for any software libraries or programming languages. |
| Experiment Setup | Yes | Further details regarding architecture, training, hyperparameters, and runtime are in Appendix E. ... Each block consists of (i) a self-attention mechanism with three attention heads and hidden state dimension dmodel = 30, (ii) and a feed-forward network with hidden layer size dff = 20. Both the (i) self-attention mechanism and (ii) the feed-forward network employ residual connections, which are followed by dropout layers with dropout probabilities p = 0.1, respectively. ... We employ additional weight decay for the two-stage learners to avoid overfitting during the pseudo-outcome regression. |