Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Achieving Minimax Rates in Pool-Based Batch Active Learning
Authors: Claudio Gentile, Zhilei Wang, Tong Zhang
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We theoretically investigate batch active learning in the practically relevant scenario where the unlabeled pool of data is available beforehand (pool-based active learning). We analyze a novel stage-wise greedy algorithm and show that, as a function of the label complexity, the excess risk of this algorithm matches the known minimax rates in standard statistical learning settings. Our results also exhibit a mild dependence on the batch size. These are the ο¬rst theoretical results that employ careful trade offs between informativeness and diversity to rigorously quantify the statistical performance of batch active learning in the pool-based scenario. |
| Researcher Affiliation | Collaboration | 1Google Research, New York 2Citadel Securities, New York 3The Hong Kong University of Science and Technology, Hong Kong. |
| Pseudocode | Yes | Algorithm 1: Pool-based batch active learning algorithm for linear models. |
| Open Source Code | No | The paper does not mention the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper discusses a 'pool P of T unlabeled instances x1, . . . , x T X' for theoretical analysis but does not mention a specific, publicly available dataset used for empirical training. |
| Dataset Splits | No | The paper is theoretical and does not discuss training/validation/test dataset splits. |
| Hardware Specification | No | The paper does not mention any specific hardware specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | No | The paper focuses on theoretical algorithms and analysis, and does not include details about an empirical experimental setup such as hyperparameters or system-level training settings. |