reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

Authors: Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne, Juergen Gall

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics. Our first series of experiments demonstrates the effect of domain bias on learned metrics, demonstrating the resilience of FWD to such bias.
Researcher Affiliation	Collaboration	Lokesh Veeramacheneni University of Bonn EMAIL Moritz Wolter University of Bonn EMAIL Hildegard Kuehne University of Tuebingen, MIT-IBM Watson AI Lab EMAIL Juergen Gall University of Bonn, Lamarr Institute for Machine Learning and Artificial Intelligence EMAIL
Pseudocode	No	The paper only describes the methodology using prose, mathematical formulas (Equations 1-6), and a flowchart (Figure 3), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for computing FWD is available at: https://github.com/BonnBytes/PyTorch-FWD.
Open Datasets	Yes	Datasets: As datasets, we use Large-scale Celeb Faces Attributes High Quality (Celeb A-HQ) (Karras et al., 2018), Flickr Faces High Quality (FFHQ), DNDD-Dataset (Yi et al., 2023), an agricultural dataset, and Sentinel (Schmitt et al., 2019), a remote sensing dataset. For the second user study, we use Conceptual Captions (Sharma et al., 2018) as the evaluation dataset
Dataset Splits	No	The paper mentions using specific numbers of images for evaluation (e.g., '50k images', '30k images') and refers to ImageNet's validation set, but it does not explicitly provide details about training/validation/test splits, their percentages, or how they were created for the datasets used.
Hardware Specification	Yes	Specifically, we trained Proj. Fast GAN for 100 epochs on both the Celeb A-HQ dataset and DNDD-Dataset, respectively, using a learning rate of 1e-4 and batch size of 64 with 8 A100 GPUs. For the Sentinel dataset, we trained Proj. Fast GAN for 150 epochs, using the same hardware and hyperparameters. On DNDD-Dataset, we trained DDGAN for 150 epochs with a learning rate of 1e-4 and batch size of 8 on the same hardware. We also trained DDGAN on the Sentinel dataset for 250 epochs, using a learning rate of 1e-4 and batch size of 4 on 4 A100 GPUs.
Software Dependencies	No	The paper mentions software like PyTorch, PyWavelets, and PyTorch Wavelet-Toolbox, but it does not specify their version numbers, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	Specifically, we trained Proj. Fast GAN for 100 epochs on both the Celeb A-HQ dataset and DNDD-Dataset, respectively, using a learning rate of 1e-4 and batch size of 64 with 8 A100 GPUs. For the Sentinel dataset, we trained Proj. Fast GAN for 150 epochs, using the same hardware and hyperparameters. On DNDD-Dataset, we trained DDGAN for 150 epochs with a learning rate of 1e-4 and batch size of 8 on the same hardware. We also trained DDGAN on the Sentinel dataset for 250 epochs, using a learning rate of 1e-4 and batch size of 4 on 4 A100 GPUs.