Feature Importance Metrics in the Presence of Missing Data
Authors: Henrik Von Kleist, Joshua Wendland, Ilya Shpitser, Carsten Marr
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Missing data estimation methods cannot be tested on real-world data with real missingness due to the unavailability of ground truth features X(1). To address this limitation, we perform a series of synthetic experiments to illustrate the differences between feature importance metrics, the impact of positivity violations, and the significance of appropriate estimation methods. Using synthetic data, we illustrate key differences between these metrics and the risks of conflating them. |
| Researcher Affiliation | Academia | 1Institute of AI for Health, Helmholtz Munich German Research Center for Environmental Health, Neuherberg, Germany 2TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany 3Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA 4Faculty of Computer Science, Ruhr University Bochum Bochum, Germany |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing code or a link to a code repository for the methodology described. |
| Open Datasets | No | Missing data estimation methods cannot be tested on real-world data with real missingness due to the unavailability of ground truth features X(1). To address this limitation, we perform a series of synthetic experiments to illustrate the differences between feature importance metrics, the impact of positivity violations, and the significance of appropriate estimation methods. |
| Dataset Splits | Yes | We generate 100,000 data points and split them into 30% for training the classifier, 30% for training the measurement policy, and 40% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We used an "impute-then-regress" classifier (Le Morvan et al., 2021) with zero imputation and a temporal convolutional network (TCN) (Bai et al., 2018) to classify labels Y t. This mentions software components (TCN) but does not specify their version numbers. |
| Experiment Setup | Yes | The classifier uses four layers, with 32 channels per layer, a batch size of 2,000, dropout rate of 0.2, and a learning rate of 0.001. The detailed configurations for each experiment, including the data-generating process parameters (W,γ) and missingness mechanisms (π), are provided in Table 1. |