Margin-Based Active Learning of Classifiers

Authors: Marco Bressan, Nicolò Cesa-Bianchi, Silvio Lattanzi, Andrea Paudice

JMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study active learning of multiclass classifiers, focusing on the realizable transductive setting. The input is a finite subset X of some metric space, and the concept to be learned is a partition C of X into k classes. The goal is to learn C by querying the labels of as few elements of X as possible. Our main result is that, in very different settings, there exist interesting notions of margin that yield efficient active learning algorithms. First, we consider the case X Rm, assuming that each class has an unknown personalized margin separating it from the rest. Second, we consider the case where X is a finite metric space, and the classes are convex with margin according to the geodesic distances in the thresholded connectivity graph. In both cases, we give algorithms that learn C exactly, in polynomial time, using O(log n) label queries, where O( ) hides a near-optimal dependence on the dimension of the metric spaces. Our results actually hold for or can be adapted to more general settings, such as pseudometric and semimetric spaces.
Researcher Affiliation Collaboration Marco Bressan EMAIL Dept. of Computer Science, Università degli Studi di Milano, Italy Nicolò Cesa-Bianchi EMAIL Dept. of Computer Science, Università degli Studi di Milano, Italy & Politecnico di Milano, Italy Silvio Lattanzi EMAIL Google Research Andrea Paudice EMAIL Dept. of Computer Science, Università degli Studi di Milano, Italy
Pseudocode Yes Algorithm 1 Expand Hull(SC, 1 + γ) Algorithm 2 Cheat Rec(X, OC, γ) Algorithm 3 m Rec(X, OC, γ, d1, . . . , dk) Algorithm 4 Learn BGC(ε, s) Algorithm 5 MBS(Z, ε) Algorithm 6 Find Cut Edge(π(si, sh)) Algorithm 7 Find Separator(Gε, ui, uj) Algorithm 8 Learn Class(Gε, ε, s, i) Algorithm 9 Find New Seed(Gε, Ri, ε, s, u, i) Algorithm 10 Learn HBGC(X,ε, s) Algorithm 11 Is Connected(G, i) Algorithm 12 Get Epsilon(T, i) Algorithm 13 Get Epsilons(G = (X, E, d), k) Algorithm 14 Find New Seed-2(Ri, c, i) Algorithm 15 Learn Class-2(Gε, ε, i) Algorithm 16 Learn HBGC-2(X,ε)
Open Source Code No The paper does not contain any explicit statements or links indicating that source code for the described methodology is publicly available.
Open Datasets No The paper defines abstract theoretical settings for active learning, such as "a finite set X from some domain" or "X Rm". It does not use or refer to any specific named public datasets for experimental evaluation.
Dataset Splits No The paper describes theoretical algorithms and their properties, rather than conducting empirical experiments on datasets. Therefore, no dataset splits are discussed.
Hardware Specification No The paper focuses on theoretical algorithm design, analysis of query complexity, and running time bounds. It does not describe any experiments that would require specific hardware, thus no hardware specifications are provided.
Software Dependencies No The paper presents theoretical algorithms and mathematical proofs. It does not mention any specific software libraries, frameworks, or versions required for implementation or experimentation.
Experiment Setup No The paper describes theoretical models and algorithms. It does not include any experimental setup details such as hyperparameters, training configurations, or system-level settings, as no empirical experiments are reported.