PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime
Authors: Andres R. Masegosa, Luis A. Ortega
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies... Figure 2: Metrics of Inception models on Cifar10 using ℓ2 regularization and/or random cropping (Crop), and randomly sampled class labels (Random). The corresponding rate functions are shown on the right. Appendix A. Experimental Settings |
| Researcher Affiliation | Academia | Andrés R. Masegosa EMAIL Department of Computer Science University of Aalborg Luis A. Ortega EMAIL Machine Learning Group Department of Computer Science Escuela Politécnica Superior Universidad Autónoma de Madrid |
| Pseudocode | No | The paper contains numerous mathematical definitions, theorems, propositions, and proofs, but no explicit pseudocode or algorithm blocks are provided. |
| Open Source Code | Yes | A Git Hub Repository with the conducted experiments can be found in https://github.com/Ludvins/2024_PAC-Chernoff-Bound. |
| Open Datasets | Yes | Cifar10 dataset (Krizhevsky et al., 2009). |
| Dataset Splits | Yes | Subsets of size n = 50 of CIFAR10 s test split are used to approximate samples of the data generating distribution and build the histograms. (Appendix A.1, "Figure 3") All models were found by running stochastic gradient descent on Cifar10 s training data, until training loss reaches 0.01 or until it did not improve in two consecutive epochs of training. (Footnote, page 12) |
| Hardware Specification | No | AM acknowledges funding for cloud computing from Google Cloud for Researchers program, from Grant PID2022-139293NB-C31 funded by MCIN/AEI/10.13039/501100011033 and by ERDF, a way of making Europe. Explanation: The paper mentions "cloud computing from Google Cloud" but does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for the experiments. |
| Software Dependencies | No | Random cropping is employed using Random Resize Crop function of torchvision with scale (0.8, 1.0) and ratio (0.9, 1.1). (Appendix A.1, "Figure 2") Both transformations are computed using Random Affine function of torchvision. (Appendix A.1, "Figure 9") the random shuffling of the pixels was performed using a random permutation using Numpy; the dataset was fully permuted and stores as a new dataset. (Appendix A.1, "Figure 10") Explanation: The paper mentions software libraries like 'torchvision' and 'Numpy' but does not provide specific version numbers for any software components, which is required for reproducibility. |
| Experiment Setup | Yes | For this experiments, all Inception models where trained using SGD with momentum 0.9 and learning rate 0.01 with exponential decay of 0.95. All models are trained for 30.000 iterations of batches of size 200 or until the train loss is under 0.005. (Appendix A.1, "Figure 2") |