A Unifying Information-theoretic Perspective on Evaluating Generative Models
Authors: Alexis Fox, Samarth Swarup, Abhijin Adiga
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our detailed experimental results demonstrate the sensitivity of our metric components to their respective qualities and reveal undesirable behaviors of other metrics. We set k to 5, k = 15, and omit the subscripts from the metric abbreviations for brevity hereafter. We follow the recommendation of Stein et al. (2024) to embed the images with the DINOv2-Vi T-L/14 encoder (Oquab et al. 2024), which they claim provides a richer representation space than the commonly used Inception network, which may unfairly punish diffusion models. This motivates our generalized abbreviation FD. Dataset Descriptions. We use both Image Net (Deng et al. 2009) and CIFAR-10 (Krizhevsky, Hinton et al. 2009) image datasets for our analysis. |
| Researcher Affiliation | Academia | Alexis Fox1, Samarth Swarup2, Abhijin Adiga2 1Duke University 2University of Virginia EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes mathematical formulations and derivations but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/NSSAC/Precision Recall Metric |
| Open Datasets | Yes | We use both Image Net (Deng et al. 2009) and CIFAR-10 (Krizhevsky, Hinton et al. 2009) image datasets for our analysis. |
| Dataset Splits | No | The paper mentions the composition of the datasets (e.g., "sampled training set for Image Net contains 1000 classes with 100 images each, while CIFAR-10 has 10 classes with 4500 images each"), but it does not specify explicit training/validation/test splits used for the experiments. |
| Hardware Specification | No | The paper mentions the use of specific models like "DINOv2-Vi T-L/14 encoder" and "Di T-XL-2 model" for image embedding and generation, but does not provide specific hardware details (e.g., GPU models, CPU types, or memory specifications) used to run these processes or experiments. |
| Software Dependencies | No | The paper mentions software components like "DINOv2-Vi T-L/14 encoder" and names various models, but it does not provide specific version numbers for any key software libraries, frameworks, or environments (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper describes experimental conditions related to model parameters (e.g., "image sets generated at five levels of the CFG parameter", "100 classes were dropped at a time"). However, it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or general system-level training settings typically found in an experimental setup description. |