A Compression Technique for Analyzing Disagreement-Based Active Learning

Authors: Yair Wiener, Steve Hanneke, Ran El-Yaniv

JMLR 2015 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the training data that induces the same version space. We show various applications of the new characterization, including a tight analysis of CAL and refined label complexity bounds for linear separators under mixtures of Gaussians and axis-aligned rectangles under product densities. The version space compression set size, as well as the new characterization of the label complexity, can be naturally extended to agnostic learning problems, for which we show new speedup results for two well known active learning algorithms. Keywords: active learning, selective sampling, sequential design, statistical learning theory, PAC learning, sample complexity, selective prediction
Researcher Affiliation Academia Yair Wiener EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel Steve Hanneke EMAIL Princeton, NJ 08542 USA Ran El-Yaniv EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel
Pseudocode Yes Algorithm: CAL(n) 0. m 0, t 0, V0 F 1. While t < n 2. m m+1 3. If xm DIS(Vm 1) 4. Request label ym; let Vm {h Vm 1 : h(xm) = ym}, t t +1 5. Else Vm Vm 1 6. Return any ˆh Vm
Open Source Code No The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories.
Open Datasets No The paper discusses theoretical learning problems such as "linear separators under mixtures of Gaussians" and "axis-aligned rectangles under product densities" but does not describe experiments using specific, named datasets or provide any access information (links, DOIs, citations to dataset papers) for public datasets.
Dataset Splits No The paper is theoretical and does not describe empirical experiments involving datasets. Therefore, there is no mention of dataset splits like training, validation, or test sets.
Hardware Specification No The paper is theoretical and does not describe experimental results that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies No The paper is theoretical and does not describe any implementation details that would necessitate listing specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical, focusing on mathematical characterizations and bounds rather than empirical experiments. As such, it does not include details about an experimental setup, hyperparameters, or training configurations.