Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Authors: Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, Yong Jae Lee
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to assess our method. We train on images generated by the original LDM model (Rombach et al., 2022), and test on images generated by later versions of Stable Diffusion as well as newer latent models such as Playground (Li et al., 2024), Kandinsky (Razzhigaev et al., 2023), Pixel Art-α (Chen et al., 2023) and Latent Consistency models (Luo et al., 2023). |
| Researcher Affiliation | Academia | Anirudh Sundara Rajan* Utkarsh Ojha* Jedidiah Schloesser Yong Jae Lee University of Wisconsin-Madison EMAIL, EMAIL |
| Pseudocode | No | The paper describes the process mathematically with a formula F = { ϕdec(ϕenc(x)) | x R} but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | For implementation details, visit our project page: anisundar18.github.io/Aligned Forensics/. We also intend to release our pre-trained checkpoints, datasets, and code to ensure reproducibility, with all resources made publicly available on Git Hub. |
| Open Datasets | Yes | Similar to Corvi et al. (2022), we use a combination of MS COCO (Lin et al., 2015) and LSUN (Yu et al., 2016) as our real dataset, totaling 179257 images. ... For the real set, we randomly select 500 images from the Redcaps dataset (Desai et al., 2021). For fake images, we generate 500 images using SD 1.5 (prompts pertain to object categories from CIFAR (Krizhevsky et al., 2010)). ... The Real set contains real images from multiple sources; 1000 images from Red Caps (Desai et al., 2021), 800 images from LAION-Aesthetics (Schuhmann et al., 2022), 1000 images from whichfaceisreal (whi) and 200 images from Wiki Art (wik). |
| Dataset Splits | Yes | Similar to Corvi et al. (2022), we use a combination of MS COCO (Lin et al., 2015) and LSUN (Yu et al., 2016) as our real dataset, totaling 179257 images. We reconstruct them using the autoencoder of the LDM model proposed by Rombach et al. (2022) to get the same number of fake images. ... We create a test set of real and fake images of increasing/decreasing resolutions. For the real set, we randomly select 500 images from the Redcaps dataset (Desai et al., 2021). For fake images, we generate 500 images using SD 1.5... We use the validation set provided by Corvi et al. (2022) for our training. ... The dataset consists of 6000 real images and 6000 images for each of the respective categories. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models or processor types used for running its experiments. |
| Software Dependencies | No | The paper mentions using Adam optimizer and data augmentations like 'random resized crop' (referencing torchvision), but it does not specify version numbers for any software libraries, frameworks, or languages used. |
| Experiment Setup | Yes | We optimize using Adam (Kingma & Ba, 2015) with an initial learning rate set to 0.0001. The rest of the training details can be found in Appendix A.1.1. ADDITIONAL TRAINING DETAILS: We train on 96 x 96 crops of the whole image using a batch size of 128. The data augmentations include random JPG compression and blur from the pipeline proposed by Wang et al. (2020). Following Gragnaniello et al. (2021), grayscale, cutout and random noise are also used as augmentations. Finally, in order to make the network invariant towards resizing, the random resized crop was added. For our method as well as Corvi, we train the model using two different random seeds and report the average reading. We use the validation set provided by Corvi et al. (2022) for our training. Just like our training set, the real images come from COCO/LSUN and the fake images are generated at 256 x 256 using LDM. During training, if the validation accuracy does not improve by 0.1% in 10 epochs the learning rate is dropped by 10x. The training is terminated at learning rate 10 6. |