A Compression Technique for Analyzing Disagreement-Based Active Learning
Authors: Yair Wiener, Steve Hanneke, Ran El-Yaniv
JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the training data that induces the same version space. We show various applications of the new characterization, including a tight analysis of CAL and refined label complexity bounds for linear separators under mixtures of Gaussians and axis-aligned rectangles under product densities. The version space compression set size, as well as the new characterization of the label complexity, can be naturally extended to agnostic learning problems, for which we show new speedup results for two well known active learning algorithms. Keywords: active learning, selective sampling, sequential design, statistical learning theory, PAC learning, sample complexity, selective prediction |
| Researcher Affiliation | Academia | Yair Wiener EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel Steve Hanneke EMAIL Princeton, NJ 08542 USA Ran El-Yaniv EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel |
| Pseudocode | Yes | Algorithm: CAL(n) 0. m 0, t 0, V0 F 1. While t < n 2. m m+1 3. If xm DIS(Vm 1) 4. Request label ym; let Vm {h Vm 1 : h(xm) = ym}, t t +1 5. Else Vm Vm 1 6. Return any ˆh Vm |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories. |
| Open Datasets | No | The paper discusses theoretical learning problems such as "linear separators under mixtures of Gaussians" and "axis-aligned rectangles under product densities" but does not describe experiments using specific, named datasets or provide any access information (links, DOIs, citations to dataset papers) for public datasets. |
| Dataset Splits | No | The paper is theoretical and does not describe empirical experiments involving datasets. Therefore, there is no mention of dataset splits like training, validation, or test sets. |
| Hardware Specification | No | The paper is theoretical and does not describe experimental results that would require specific hardware. No hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any implementation details that would necessitate listing specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical, focusing on mathematical characterizations and bounds rather than empirical experiments. As such, it does not include details about an experimental setup, hyperparameters, or training configurations. |