Active Nearest-Neighbor Learning in Metric Spaces
Authors: Aryeh Kontorovich, Sivan Sabato, Ruth Urner
JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. MARMANN is based on a generalized sample compression scheme, and a new label-efficient active model-selection procedure. |
| Researcher Affiliation | Academia | Aryeh Kontorovich EMAIL Sivan Sabato EMAIL Department of Computer Science Ben-Gurion University of the Negev Beer Sheva 8499000, Israel Ruth Urner EMAIL Lassonde School of Engineeging, EECS Department York University Toronto, ON, Canada |
| Pseudocode | Yes | Algorithm 1 MARMANN: MArgin Regularized Metric Active Nearest Neighbor Algorithm 2 Generate NNSet(t, I, δ) Algorithm 3 Est Ber(θ, β, δ) Algorithm 4 Select Scale(δ) |
| Open Source Code | No | We note that implementing even this passive component efficiently is far from trivial; this formed the core of a Master s thesis (Korsunsky, 2017). The remaining obstacle to making our algorithm fully practical is the magnitude of some of the constants. We believe these to be artifacts of the proof and intend to bring them down to manageable values in future work. |
| Open Datasets | No | The paper focuses on theoretical development and proofs for an algorithm in general metric spaces. It does not mention the use of any specific datasets for experimental evaluation. |
| Dataset Splits | No | The paper does not describe any experimental evaluation using specific datasets, and therefore no information on dataset splits is provided. |
| Hardware Specification | No | The paper is theoretical in nature, focusing on algorithm design and proofs, and does not describe any experimental setup or hardware used. |
| Software Dependencies | No | The paper is theoretical, presenting an algorithm and its guarantees. It does not describe an experimental implementation with software dependencies. |
| Experiment Setup | No | The paper details a theoretical algorithm and its mathematical guarantees. It does not include an experimental section with specific setup details like hyperparameters or training configurations. |