PAC-Chernoff Bounds: Understanding Generalization in the Interpolation Regime

Authors: Andres R. Masegosa, Luis A. Ortega

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper introduces a distribution-dependent PAC-Chernoff bound that exhibits perfect tightness for interpolators, even within over-parameterized model classes. This bound, which relies on basic principles of Large Deviation Theory, defines a natural measure of the smoothness of a model, characterized by simple real-valued functions. Building upon this bound and the new concept of smoothness, we present an unified theoretical framework revealing why certain interpolators show an exceptional generalization, while others falter. We theoretically show how a wide spectrum of modern learning methodologies... Figure 2: Metrics of Inception models on Cifar10 using ℓ2 regularization and/or random cropping (Crop), and randomly sampled class labels (Random). The corresponding rate functions are shown on the right. Appendix A. Experimental Settings
Researcher Affiliation Academia Andrés R. Masegosa EMAIL Department of Computer Science University of Aalborg Luis A. Ortega EMAIL Machine Learning Group Department of Computer Science Escuela Politécnica Superior Universidad Autónoma de Madrid
Pseudocode No The paper contains numerous mathematical definitions, theorems, propositions, and proofs, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code Yes A Git Hub Repository with the conducted experiments can be found in https://github.com/Ludvins/2024_PAC-Chernoff-Bound.
Open Datasets Yes Cifar10 dataset (Krizhevsky et al., 2009).
Dataset Splits Yes Subsets of size n = 50 of CIFAR10 s test split are used to approximate samples of the data generating distribution and build the histograms. (Appendix A.1, "Figure 3") All models were found by running stochastic gradient descent on Cifar10 s training data, until training loss reaches 0.01 or until it did not improve in two consecutive epochs of training. (Footnote, page 12)
Hardware Specification No AM acknowledges funding for cloud computing from Google Cloud for Researchers program, from Grant PID2022-139293NB-C31 funded by MCIN/AEI/10.13039/501100011033 and by ERDF, a way of making Europe. Explanation: The paper mentions "cloud computing from Google Cloud" but does not specify any particular hardware (e.g., GPU models, CPU types, or memory) used for the experiments.
Software Dependencies No Random cropping is employed using Random Resize Crop function of torchvision with scale (0.8, 1.0) and ratio (0.9, 1.1). (Appendix A.1, "Figure 2") Both transformations are computed using Random Affine function of torchvision. (Appendix A.1, "Figure 9") the random shuffling of the pixels was performed using a random permutation using Numpy; the dataset was fully permuted and stores as a new dataset. (Appendix A.1, "Figure 10") Explanation: The paper mentions software libraries like 'torchvision' and 'Numpy' but does not provide specific version numbers for any software components, which is required for reproducibility.
Experiment Setup Yes For this experiments, all Inception models where trained using SGD with momentum 0.9 and learning rate 0.01 with exponential decay of 0.95. All models are trained for 30.000 iterations of batches of size 200 or until the train loss is under 0.005. (Appendix A.1, "Figure 2")