A Theoretical Framework For Overfitting In Energy-based Modeling
Authors: Giovanni Catania, Aurélien Decelle, Cyril Furtlehner, Beatriz Seoane
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This work develops a theoretical framework for understanding and mitigating overfitting in EBMs. We begin with a simple Gaussian model as a fundamental non-trivial example, using it to quantitatively analyze overfitting through synthetic experiments with predefined ground truths. We examine eigenvalue dynamics using artificial covariance matrices that simulate real datasets, exploring how overfitting arises from different learning timescales associated with various eigenmodes of the empirical covariance matrix. We address inaccuracies in learned eigenvalues with corrections based on random matrix theory (RMT)... |
| Researcher Affiliation | Academia | 1Departamento de F ısica Te orica, Universidad Complutense de Madrid, Spain. 2Escuela T ecnica Superior de Ingenieros Industriales, Universidad Polit ecnica de Madrid, Spain 3Inria-Saclay, Universit e Paris-Saclay, LISN, Gif-sur-Yvette, France. |
| Pseudocode | No | The paper contains mathematical equations and descriptions of methods, but no explicit sections or figures labeled as "Pseudocode" or "Algorithm" with structured, code-like steps. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code, nor does it provide links to code repositories or mention code in supplementary materials. |
| Open Datasets | Yes | Figure 1. (a): Eigenvalue spectra of the empirical covariance matrices for MNIST dataset (Deng, 2012)... We illustrate how the principal components control a timescale separation, where information progressively encoded from the strongest to the weakest data modes... The spectra for CIFAR-10 (Krizhevsky et al., 2009) and the Human Genome Dataset (Consortium et al., 2015) are displayed in (a) and (b), respectively. |
| Dataset Splits | No | The paper discusses generating data points with finite 'M' samples for training and evaluates 'Etrain' and 'Etest' or 'LLtrain,test', implying a distinction between training and testing data. However, it does not provide specific details on how empirical datasets were split into training, validation, or test sets with percentages, absolute counts, or predefined split methodologies, which is required for reproducing data partitioning. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory amounts, or cloud instances) used for running its experiments. |
| Software Dependencies | No | The paper describes mathematical and algorithmic methodologies but does not specify any software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their specific versions) used for implementation or experimentation. |
| Experiment Setup | Yes | Jt+1 ij = Jt ij + γ L Jij where γ is the learning rate... In all cases the initial condition is an identity matrix. (Figure 2)... The learning rate is set to γ = 10 3. (Appendix C)... The learning rate is set to γ = 10 2. (Appendix I)... starting from the same initial condition (Jα(0) = 1)... Starting from an initial condition J(0) that does not commute with C... |