Generalization in VAE and Diffusion Models: A Unified Information-Theoretic Analysis
Authors: Qi Chen, Jierui Zhu, Florian Shkurti
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical results on both synthetic and real datasets illustrate the validity of the proposed theory. |
| Researcher Affiliation | Academia | Qi Chen1,3,4, , Jierui Zhu2, & Florian Shkurti1,4,5, 1Department of Computer Science, University of Toronto 2Department of Statistical Sciences, University of Toronto 3Data Science Institute, 4Vector Institute, 5 Robotics Institute Correspondence to: EMAIL, EMAIL. |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual descriptions, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our experimental code is available at https://github.com/livre Q/Info Gen Analysis. |
| Open Datasets | Yes | Empirical results on both synthetic and real datasets illustrate the validity of the proposed theory. ...We begin by validating the theorem on a simple synthetic 2D dataset derived from the Swiss Roll dataset. ...We further estimate the bound and the test data KL divergence (or log densities) by training DMs on MNIST and CIFAR10 datasets with few-shot data (m = 16) and full train dataset. |
| Dataset Splits | Yes | We train the score matching model sθ(x, t) and estimate the upper bound in Theorem 6.2 on a training set of size m. W.r.t the expectation over dataset S, we conduct 5-times Monte-Carlo estimation by randomly generating train datasets with different random seeds. For the left-hand-side KL-divergence, we conduct Monte Carlo estimation of with 1000 test data points. ...We further estimate the bound and the test data KL divergence (or log densities) by training DMs on MNIST and CIFAR10 datasets with few-shot data (m = 16) and full train dataset. |
| Hardware Specification | Yes | The experiments for Swill Roll data were running on a machine with 1 2080Ti GPU of 11GB memory. The experiments for MNIST and CIFAR10 were running on several server nodes with 6 CPUs and 1 GPU of 32GB memory. |
| Software Dependencies | No | We optimized the model parameters using the Adam optimizer with a learning rate of η = 10 3. The training was conducted with a batch size of 64, while the remaining Adam hyperparameters were kept at their default values in Py Torch. |
| Experiment Setup | Yes | We optimized the model parameters using the Adam optimizer with a learning rate of η = 10 3. The training was conducted with a batch size of 64, while the remaining Adam hyperparameters were kept at their default values in Py Torch. The model was trained for 100 epochs. ...The score matching model sθ(x, t) is trained for 10000 iterations, and the backward generation takes 1000 steps, i.e., N = 1000. |