Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
Authors: Henrik von Kleist, Alireza Zamanian, Ilya Shpitser, Narges Ahmidi
JMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 8, we present synthetic data experiments that exemplify the improved data efficiency and reduced positivity requirements of the semi-offline RL estimators. Our experiments also show that biased evaluation methods commonly used in the AFA literature can lead to detrimental conclusions regarding the performance of AFA agents. Deploying such methods without caution may pose significant risks to patients lives. We end the paper with a Discussion (Section 9) and Conclusion (Section 10). |
| Researcher Affiliation | Collaboration | Henrik von Kleist1,2,3 EMAIL Alireza Zamanian2,4 EMAIL Ilya Shpitser3 EMAIL Narges Ahmidi1,3,4 EMAIL 1Institute of AI for Health, Helmholtz Munich German Research Center for Environmental Health, 85764 Neuherberg, Germany 2TUM School of Computation, Information and Technology, Technical University of Munich, 85748 Garching, Germany 3Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA 4Fraunhofer Institute for Cognitive Systems IKS, 80686 Munich, Germany |
| Pseudocode | No | The paper describes methods primarily through mathematical formulations and textual explanations of concepts and algorithms, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We evaluate the different estimators on synthetic data sets where the missingness is artificially induced to allow the comparison with the ground truth. For the experiments, we defined a superfeature as a feature that comprises multiple subfeatures, which are acquired jointly and which have a single cost. Furthermore, we assumed a subset of features is available at no cost (free features) and set fixed acquisition costs cacq for the remaining features. A prediction was to be performed at each time step, which corresponds to the setting described in Appendix K. We chose misclassification costs such that good policies must find a balance between the feature acquisition cost and the predictive value of the features. We evaluated and compared the described methods on synthetic data sets with and without violation of either the NDE or NUC assumption. In experiments where the NDE assumption holds, the features are distributed according to: ( γi Xt 1 (1),i + (1 γi)ϵi, if t > 0 ϵi, if t = 0. where ϵi N(0, σ). In experiments with a violation of the NDE assumption, the unobserved variables U were distributed according to: γi Ut 1 i + (1 γi)ϵi + 0.5 P i At 1 i if t > 1 γi Ut 1 i + (1 γi)ϵi, if t = 1 ϵi, if t = 0. The labels are distributed according to p(Y t = 1) = ( 1, if ζ1 P i Wi Xt (1),i + ζ2 P i Wi Xt 1 (1),i > 0 0.3, otherwise. This choice for Y simulates a scenario where not all data points are equally easy to classify. The retrospective policy πβ follows different logistic models depending on whether a MAR assumption (NUC holds) or MNAR assumption (NUC is violated) is assumed, as specified in Table 3. To evaluate the convergence of different estimators when the NDE assumption holds, we consider the average cost of running the AFA agent on the data set over all data points in the ground truth test set (without missingness) as the true expected cost J. When NDE is violated, we sample the ground truth data generating process while running the agent and do so the same number of times as there are data points in the test set. We performed five different experiments: ... For full experiment configurations for the acquisition processes, please see Table 4. |
| Dataset Splits | Yes | Sample size n D 100 000 divided into 30% training set (for agent and classifier), 30% nuisance function training set, and 40% test set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud resources with specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'logistic regression' models and a 'proximal policy optimization (PPO) RL agent' as methods, but does not specify any software libraries, frameworks (e.g., PyTorch, TensorFlow), or their version numbers. It also refers to 'impute-then-regress classifier (Le Morvan et al., 2021)' but without detailing specific software versions for its implementation. |
| Experiment Setup | Yes | We used an impute-then-regress classifier (Le Morvan et al., 2021) with unconditional mean imputation and a logistic regression classifier for the classification task and trained it on the available and further randomly subsampled data (where p(At i = 1) = 0.5). We tested random and fixed acquisition policies that acquire each costly feature with a 50% or 100% probability. Furthermore, we evaluated a proximal policy optimization (PPO) RL agent (Schulman et al., 2017), which was trained on the semi-offline sampling distribution p using πα as the semi-offline sampling policy, but without adjustment for the blocking of actions. ... PPO (learning rate: 0.0001, number of layers: 2, hidden layer neurons per layer: 64, hidden layer activation function: tanh). Nuisance functions ˆπβ (logistic regression), ˆQSemi (Ξ = , learning rate: 0.001, number of layers: 2, hidden layer neurons per layer: 16, hidden layer activation function: Re LU). |