Classification vs regression in overparameterized regimes: Does the loss function matter?

Authors: Vidya Muthukumar, Adhyyan Narang, Vignesh Subramanian, Mikhail Belkin, Daniel Hsu, Anant Sahai

JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Figure 1: Correspondence between the SVM and minimum-ℓ2-norm interpolation, illustrated by Fourier features on regularly spaced training data with 10% label noise (for various rates of feature scaling corresponding to λk = 1 km in the optimization to adjust the preference for lower frequencies as given in Definition 19 in Appendix A). Figure 3: Experimental illustration of Theorem 11 for Gaussian features: fraction of the training points that are support vectors increases as effective overparameterization increases. Figure 4: Comparison of test classification and regression error on solutions obtained by minimizing different choices of training loss on the bi-level ensemble. For both figures, parameters (p = 3/2, r = 1/2) are fixed. Section 6. Examining margin-based explanations for generalization Through simple experiments, we demonstrate that margin-based generalization bounds are uninformative in sufficiently overparameterized settings. Figure 5 plots the isotropic case, and Figure 6 plots the weak features case.
Researcher Affiliation Academia Vidya Muthukumar EMAIL Electrical and Computer Engineering and Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA-30332, USA Adhyyan Narang EMAIL Department of Electrical and Computer Engineering University of Washington Seattle, WA-98115, USA Vignesh Subramanian EMAIL Department of Electrical Engineering and Computer Sciences University of California Berkeley Berkeley, CA-94720, USA Mikhail Belkin EMAIL Halicio glu Data Science Institute UC San Diego La Jolla, CA-92093, USA Daniel Hsu EMAIL Department of Computer Science and Data Science Institute Columbia University New York, NY-10027, USA Anant Sahai EMAIL Department of Electrical Engineering and Computer Sciences University of California Berkeley Berkeley, CA-94720, USA
Pseudocode No The paper describes methods using mathematical formulations and textual descriptions rather than structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about the release of source code or links to a code repository for the methodology described.
Open Datasets No The paper defines and uses various synthetic data models (e.g., "Gaussian features", "Isotropic ensemble", "Bi-level ensemble", "Weak features ensemble", "Polynomial decay of eigenvalues ensemble") for theoretical analysis and simulation. It does not utilize or provide access information for any pre-existing public datasets.
Dataset Splits No The paper describes the generation and sampling of training and test data for its theoretical analysis and simulations, such as "training data {Xi, Yi}n i=1" and "ntest test samples of data drawn without any label noise". However, it does not provide specific split percentages, fixed dataset partitions, or references to predefined splits of a larger, pre-existing dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments or simulations.
Software Dependencies No The paper does not specify any software dependencies with version numbers used for the implementation or experiments.
Experiment Setup Yes For both figures, parameters (p = 3/2, r = 1/2) are fixed. On the left, n = 529, d = 12167 are fixed. Here, the dashed green curve corresponds to bα2,real (Equation 3b), the orange curve corresponds to bα2,binary (Equation 3a), the solid blue curve corresponds to bαSVM (Equation 4), and the black lines demarcate the regimes from Theorem 13. On the right, d varies as n 3 2 . Definition 4 (Isotropic ensemble(n, d)) The isotropic ensemble, parameterized by (n, d), considers isotropic Gaussian features, Σ = Id. For this ensemble, we will fix n and study the evolution of various quantities as a function of d n. Definition 5 (Bi-level ensemble(n, p, q, r)) The bi-level ensemble is parameterized by (n, p, q, r), where5 p > 1, 0 r < 1 and 0 < q < (p r). Here, parameter p controls the extent of artificial overparameterization), r sets the number of preferred features, and q controls the weights on preferred features and thus effective overparameterization.