Targeted Active Learning for Bayesian Decision-Making

Authors: Louis Filstroff, Iiris Sundin, Petrus Mikkola, Aleksei Tiulpin, Juuso Kylmäoja, Samuel Kaski

TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our targeted active learning strategy to existing alternatives on both simulated and real data and show improved performance in decision-making accuracy. [...] We empirically demonstrate the advantages of the proposed method with respect to existing AL baselines, both in simulated and real-world experiments.
Researcher Affiliation Academia Louis Filstroff EMAIL Univ. Lille, CNRS, Centrale Lille, UMR 9189 CRIStAL, F-59000 Lille, France; Iiris Sundin EMAIL Department of Computer Science Aalto University, Finland; Petrus Mikkola EMAIL Department of Computer Science Aalto University, Finland University of Helsinki, Finland; Aleksei Tiulpin Research Unit of Health Sciences and Technology University of Oulu, Finland; Juuso Kylmäoja EMAIL Department of Computer Science Aalto University, Finland; Samuel Kaski EMAIL Department of Computer Science Aalto University, Finland
Pseudocode Yes Algorithm 1 Estimating the criterion Eq. (10) for (xj, dj) U
Open Source Code No The paper mentions "Python implementation is carried out with the framework GPy4 (open-source, under BSD licence)." but this refers to a third-party framework used, not the authors' own implementation code for the methodology described.
Open Datasets Yes IHDP dataset2 (Hill, 2011), a semi-synthetic dataset which consists of 747 patients with 25 covariates. ... 2Available online as part of the supplementary material of Hill (2011). ... Osteoarthritis Initiative (OAI) database3 ... 3https://nda.nih.gov/oai/
Dataset Splits No The paper states: "each considered dataset is randomly split into a training set D, query set U, and a test set." and "Experiments are run with a starting training set of size 100 for the synthetic dataset and the OAI dataset, and of size 50 for the IHDP dataset." It also mentions a test population of Nt = 50 points. However, it does not specify the exact percentages or absolute counts for the query set, or the total size of each dataset to deduce the split proportions, which is necessary for full reproducibility of the data partitioning.
Hardware Specification No All experiments were run on a high-performance computing cluster. This statement is too general and does not provide specific details like GPU/CPU models, memory, or other hardware specifications.
Software Dependencies Yes Python implementation is carried out with the framework GPy4 (open-source, under BSD licence).
Experiment Setup No GP hyperparameters (variance, lengthscales), as well as the noise variance, are estimated with maximum marginal likelihood. This describes the method for finding hyperparameters but does not provide their specific values or other training-related hyperparameters like learning rates, batch sizes, or number of epochs.