reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Black-Box Batch Active Learning for Regression

Authors: Andreas Kirsch

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models.
Researcher Affiliation	Academia	Andreas Kirsch EMAIL OATML, Department of Computer Science University of Oxford
Pseudocode	No	The paper describes the methodology using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/BlackHC/2302.08981.
Open Datasets	Yes	We use 15 large tabular datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and the Open ML benchmark suite (Vanschoren et al., 2014) for our experiments
Dataset Splits	No	The paper mentions 'Initial pool set size' and 'Test set size' for the datasets in Table 2, and describes an active learning process where labels are acquired for batches and added to a 'training set Dtrain'. However, it does not provide specific initial training/validation/test split percentages, sample counts for a fixed split, or refer to standard splits for reproduction beyond the pool and test sizes.
Hardware Specification	Yes	We used A100 GPUs with 40GB of GPU memory.
Software Dependencies	No	The paper mentions 'scikit-learn (Pedregosa et al., 2011)' and 'Cat Boost (Dorogush et al., 2018)' but does not specify exact version numbers for these or other software libraries.
Experiment Setup	Yes	We use the same experimental setup and hyperparameters as Holzmüller et al. (2022). We report the logarithmic RMSE averaged over 5 trials for each dataset and method. For deep learning, we use a small ensemble of 10 models... For random forests, we use the implementation provided in scikit-learn (Pedregosa et al., 2011) with default hyperparameters, that is using 100 trees per forest... For gradient-boosted decision trees, we use a virtual ensemble of up to 20 members with early stopping using a validation set.