reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A Compression Technique for Analyzing Disagreement-Based Active Learning

Authors: Yair Wiener, Steve Hanneke, Ran El-Yaniv

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We introduce a new and improved characterization of the label complexity of disagreement-based active learning, in which the leading quantity is the version space compression set size. This quantity is defined as the size of the smallest subset of the training data that induces the same version space. We show various applications of the new characterization, including a tight analysis of CAL and refined label complexity bounds for linear separators under mixtures of Gaussians and axis-aligned rectangles under product densities. The version space compression set size, as well as the new characterization of the label complexity, can be naturally extended to agnostic learning problems, for which we show new speedup results for two well known active learning algorithms. Keywords: active learning, selective sampling, sequential design, statistical learning theory, PAC learning, sample complexity, selective prediction
Researcher Affiliation	Academia	Yair Wiener EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel Steve Hanneke EMAIL Princeton, NJ 08542 USA Ran El-Yaniv EMAIL Computer Science Department Technion Israel Institute of Technology Haifa 32000, Israel
Pseudocode	Yes	Algorithm: CAL(n) 0. m 0, t 0, V0 F 1. While t < n 2. m m+1 3. If xm DIS(Vm 1) 4. Request label ym; let Vm {h Vm 1 : h(xm) = ym}, t t +1 5. Else Vm Vm 1 6. Return any ˆh Vm
Open Source Code	No	The paper does not contain any explicit statements about releasing code, nor does it provide links to any code repositories.
Open Datasets	No	The paper discusses theoretical learning problems such as "linear separators under mixtures of Gaussians" and "axis-aligned rectangles under product densities" but does not describe experiments using specific, named datasets or provide any access information (links, DOIs, citations to dataset papers) for public datasets.
Dataset Splits	No	The paper is theoretical and does not describe empirical experiments involving datasets. Therefore, there is no mention of dataset splits like training, validation, or test sets.
Hardware Specification	No	The paper is theoretical and does not describe experimental results that would require specific hardware. No hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe any implementation details that would necessitate listing specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical, focusing on mathematical characterizations and bounds rather than empirical experiments. As such, it does not include details about an experimental setup, hyperparameters, or training configurations.