Active feature acquisition via explainability-driven ranking
Authors: Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on multiple datasets demonstrate that our approach outperforms current state-of-the-art AFA methods in predictive accuracy and feature acquisition efficiency. These findings highlight the promise of an explainability-driven AFA strategy in scenarios where feature acquisition is a concern. |
| Researcher Affiliation | Academia | 1Department of Electrical & Computer Engineering, Boston University, MA, USA 2Department of Computer Science, Boston University, MA, USA 3Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, MA, USA 4Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, MA, USA 5Faculty of Computing & Data Sciences, Boston University, MA, USA. |
| Pseudocode | Yes | A.2. Pseudocodes Below, we provide the pseudocode for our first and second training stages, as well as for the inference stage. Algorithm 1 Pseudocode for the first-stage training of qπ and fθ Algorithm 2 Pseudocode for the second-stage training of qπ and fθ Algorithm 3 Pseudocode for the inference stage |
| Open Source Code | Yes | 1Our code is available at https://github.com/ vkola-lab/icml2025 |
| Open Datasets | Yes | Datasets. We conducted experiments on nine datasets (Table 1): five tabular datasets (Spambase, Metabric, CPS, CTGS, and CKD) and four image datasets (CIFAR-10, CIFAR-100, Blood MNIST, and Image Nette). For image datasets, we partitioned each image into non-overlapping patches. For detailed descriptions of the datasets, we refer readers to the Appendix (A.1). |
| Dataset Splits | No | The paper mentions using 'test set' and 'validation set' in the context of results (e.g., 'performance... on the test set, and from ... on the validation set.'), and refers to the available training samples. However, it does not provide specific percentages, absolute counts, or explicit references to predefined standard splits for any of the datasets used. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | The paper mentions several software components like 'Adam optimizer', 'cosine scheduler', 'Cat Boost', and the 'SHAP package', but it does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | During training, we fixed the number of epochs to 200 and 16 for the first and second stage, respectively. We used Adam optimizer (Kingma & Ba, 2014) and a cosine scheduler (Loshchilov & Hutter, 2017). In qπ, we set context length ℓ to 4, number of heads and layers 4 and 3, respectively. |