Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime
Authors: Niladri S. Chatterji, Philip M. Long
JMLR 2021 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Simulations We experimentally study the behavior of the maximum margin classifier in the overparameterized regime on synthetic data generated according to the Boolean noisy rare-weak model. Recall that this is a model where the clean label y { 1, 1} is first generated by flipping a fair coin. Then the covariate x is drawn from a distribution conditioned on y such that s of the coordinates of x are equal to y with probability 1/2 + γ and the other p s coordinates are random and independent of the true label. The noisy label y is obtained by flipping y with probability η. In this section the flipping probability η is always 0.1. In all our experiments the number of samples n is kept constant at 100. In the first experiment in Figure 2 we hold n and the number of relevant attributes s constant and vary the dimension p for different values of γ. We find that after an initial dip in the test error (for γ = 0.2, 0.3) the test error starts to rise slowly with p, as in our upper bounds. Next, in Figure 3 we explore the scaling of the test error with the number of relevant attributes s when n and p are held constant. As we would expect, the test error decreases as s grows for all the different values of γ. Finally, in Figure 4 we study how the test error changes when both p and s are increasing when n and γ are held constant. Our results (see Corollary 3) do not guarantee learning when s = Θ( p) (and Jin (2009) proved that learning is impossible in a related setting, even in the absence of noise); we find that the test error remains constant in our experiment in this setting. In the cases when s = p0.55 and when s = p0.65, slightly beyond this threshold, the test error approaches the Bayes-optimal error as p gets large in our experiment. |
| Researcher Affiliation | Collaboration | Niladri S. Chatterji EMAIL University of California, Berkeley, 366 Le Conte Hall, Berkeley, CA 94720 Philip M. Long EMAIL Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043 |
| Pseudocode | No | The paper includes mathematical proofs, lemmas, and theoretical derivations (e.g., Lemma 4, Lemma 6, Lemma 7, Lemma 8, Lemma 11, Lemma 22), but it does not contain any explicitly labeled pseudocode or algorithm blocks. The |
| Open Source Code | No | The paper states that "a preliminary version of this paper was posted on arXiv (Chatterji and Long, 2020)" which is a preprint server, but it does not provide an explicit statement about code release for the methodology described in the paper, nor does it include any links to code repositories. |
| Open Datasets | No | The paper uses "synthetic data generated according to the Boolean noisy rare-weak model" for its simulations. It describes the data generation process in detail but does not provide access information (link, DOI, repository) to an existing public dataset or to the specific synthetic data generated for the experiments. Therefore, it does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | In the "Simulations" section, the paper mentions that "The number of samples n = 100 is kept fixed" and that "The plot is generated by averaging over 500 draws of the samples. The train error on all runs was always 0." While it mentions 'test error' and 'train error,' it does not specify how the initial n=100 samples are split into training, validation, or test sets, or any specific methodology for partitioning the data, beyond implying that the 'test error' is evaluated on separate data from the training process. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the simulations or experiments described in Section 5. |
| Software Dependencies | No | The paper does not specify any software dependencies, libraries, or programming languages with version numbers that would be needed to reproduce the experiments. |
| Experiment Setup | No | The "Simulations" section describes the parameters used for generating the synthetic data (e.g., "flipping probability η is always 0.1", "number of samples n = 100", "number of relevant variables s = 50", varying "dimension p in the interval [100, 3000]", different values of "γ"). However, it does not provide details about the experimental setup for training the maximum margin classifier itself, such as specific hyperparameters (e.g., learning rate, number of iterations) for the gradient descent process mentioned in the theoretical analysis. |