reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

Authors: Yuan Cao, Quanquan Gu, Mikhail Belkin

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	In this paper, we study this benign overﬁtting phenomenon of the maximum margin classiﬁer for linear classiﬁcation problems. Speciﬁcally, we consider data generated from sub-Gaussian mixtures, and provide a tight risk bound for the maximum margin linear classiﬁer in the over-parameterized setting. Our results precisely characterize the condition under which benign overﬁtting can occur in linear classiﬁcation problems, and improve on previous work. They also have direct implications for over-parameterized logistic regression.
Researcher Affiliation	Academia	Yuan Cao Department of Statistics & Actuarial Science Department of Mathematics The University of Hong Kong EMAIL Quanquan Gu Department of Computer Science University of California, Los Angeles Los Angeles, CA 90095, USA EMAIL Mikhail Belkin Halicio glu Data Science Institute University of California San Diego La Jolla, CA 92093, USA EMAIL
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide a direct link or explicit statement in the main text about the availability of its source code.
Open Datasets	No	We consider a model where the feature vectors are generated from a mixture of two sub-Gaussian distributions with means µ and µ and the same covariance matrix Σ. We consider n training data points (xi, yi) generated independently from the above procedure
Dataset Splits	No	The paper defines training data generation but does not specify any dataset splits (e.g., train/validation/test percentages or counts).
Hardware Specification	No	The paper states 'All experiments can be run very efﬁciently on a standard PC.' but does not provide specific hardware details (e.g., CPU/GPU model, memory).
Software Dependencies	No	The paper does not list any specific software dependencies with version numbers.
Experiment Setup	No	As a theoretical paper, it defines a model and assumptions but does not describe an experimental setup with hyperparameters or training configurations for empirical evaluation.