reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Active feature acquisition via explainability-driven ranking

Authors: Osman Berke Guney, Ketan Suhaas Saichandran, Karim Elzokm, Ziming Zhang, Vijaya B Kolachalama

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple datasets demonstrate that our approach outperforms current state-of-the-art AFA methods in predictive accuracy and feature acquisition efficiency. These findings highlight the promise of an explainability-driven AFA strategy in scenarios where feature acquisition is a concern.
Researcher Affiliation	Academia	1Department of Electrical & Computer Engineering, Boston University, MA, USA 2Department of Computer Science, Boston University, MA, USA 3Department of Medicine, Boston University Chobanian & Avedisian School of Medicine, MA, USA 4Department of Electrical & Computer Engineering, Worcester Polytechnic Institute, MA, USA 5Faculty of Computing & Data Sciences, Boston University, MA, USA.
Pseudocode	Yes	A.2. Pseudocodes Below, we provide the pseudocode for our first and second training stages, as well as for the inference stage. Algorithm 1 Pseudocode for the first-stage training of qπ and fθ Algorithm 2 Pseudocode for the second-stage training of qπ and fθ Algorithm 3 Pseudocode for the inference stage
Open Source Code	Yes	1Our code is available at https://github.com/ vkola-lab/icml2025
Open Datasets	Yes	Datasets. We conducted experiments on nine datasets (Table 1): five tabular datasets (Spambase, Metabric, CPS, CTGS, and CKD) and four image datasets (CIFAR-10, CIFAR-100, Blood MNIST, and Image Nette). For image datasets, we partitioned each image into non-overlapping patches. For detailed descriptions of the datasets, we refer readers to the Appendix (A.1).
Dataset Splits	No	The paper mentions using 'test set' and 'validation set' in the context of results (e.g., 'performance... on the test set, and from ... on the validation set.'), and refers to the available training samples. However, it does not provide specific percentages, absolute counts, or explicit references to predefined standard splits for any of the datasets used.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models.
Software Dependencies	No	The paper mentions several software components like 'Adam optimizer', 'cosine scheduler', 'Cat Boost', and the 'SHAP package', but it does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	During training, we fixed the number of epochs to 200 and 16 for the first and second stage, respectively. We used Adam optimizer (Kingma & Ba, 2014) and a cosine scheduler (Loshchilov & Hutter, 2017). In qπ, we set context length ℓ to 4, number of heads and layers 4 and 3, respectively.