Black-Box Batch Active Learning for Regression
Authors: Andreas Kirsch
TMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our approach through extensive experimental evaluations on regression datasets, achieving surprisingly strong performance compared to white-box approaches for deep learning models. |
| Researcher Affiliation | Academia | Andreas Kirsch EMAIL OATML, Department of Computer Science University of Oxford |
| Pseudocode | No | The paper describes the methodology using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/BlackHC/2302.08981. |
| Open Datasets | Yes | We use 15 large tabular datasets from the UCI Machine Learning Repository (Dua & Graff, 2017) and the Open ML benchmark suite (Vanschoren et al., 2014) for our experiments |
| Dataset Splits | No | The paper mentions 'Initial pool set size' and 'Test set size' for the datasets in Table 2, and describes an active learning process where labels are acquired for batches and added to a 'training set Dtrain'. However, it does not provide specific initial training/validation/test split percentages, sample counts for a fixed split, or refer to standard splits for reproduction beyond the pool and test sizes. |
| Hardware Specification | Yes | We used A100 GPUs with 40GB of GPU memory. |
| Software Dependencies | No | The paper mentions 'scikit-learn (Pedregosa et al., 2011)' and 'Cat Boost (Dorogush et al., 2018)' but does not specify exact version numbers for these or other software libraries. |
| Experiment Setup | Yes | We use the same experimental setup and hyperparameters as Holzmüller et al. (2022). We report the logarithmic RMSE averaged over 5 trials for each dataset and method. For deep learning, we use a small ensemble of 10 models... For random forests, we use the implementation provided in scikit-learn (Pedregosa et al., 2011) with default hyperparameters, that is using 100 trees per forest... For gradient-boosted decision trees, we use a virtual ensemble of up to 20 members with early stopping using a validation set. |