Feature Importance Metrics in the Presence of Missing Data

Authors: Henrik Von Kleist, Joshua Wendland, Ilya Shpitser, Carsten Marr

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Missing data estimation methods cannot be tested on real-world data with real missingness due to the unavailability of ground truth features X(1). To address this limitation, we perform a series of synthetic experiments to illustrate the differences between feature importance metrics, the impact of positivity violations, and the significance of appropriate estimation methods. Using synthetic data, we illustrate key differences between these metrics and the risks of conflating them.
Researcher Affiliation Academia 1Institute of AI for Health, Helmholtz Munich German Research Center for Environmental Health, Neuherberg, Germany 2TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany 3Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA 4Faculty of Computer Science, Ruhr University Bochum Bochum, Germany
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statement about releasing code or a link to a code repository for the methodology described.
Open Datasets No Missing data estimation methods cannot be tested on real-world data with real missingness due to the unavailability of ground truth features X(1). To address this limitation, we perform a series of synthetic experiments to illustrate the differences between feature importance metrics, the impact of positivity violations, and the significance of appropriate estimation methods.
Dataset Splits Yes We generate 100,000 data points and split them into 30% for training the classifier, 30% for training the measurement policy, and 40% for testing.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No We used an "impute-then-regress" classifier (Le Morvan et al., 2021) with zero imputation and a temporal convolutional network (TCN) (Bai et al., 2018) to classify labels Y t. This mentions software components (TCN) but does not specify their version numbers.
Experiment Setup Yes The classifier uses four layers, with 32 channels per layer, a batch size of 2,000, dropout rate of 0.2, and a learning rate of 0.001. The detailed configurations for each experiment, including the data-generating process parameters (W,γ) and missingness mechanisms (π), are provided in Table 1.