Information Theoretic Text-to-Image Alignment

Authors: Chao Wang, Giulio Franzese, alessandro finamore, Massimo Gallo, Pietro Michiardi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis indicates that our method is superior to the state-of-the-art, yet it only requires the pre-trained denoising network of the T2I model itself to estimate MI, and a simple finetuning strategy that improves alignment while maintaining image quality. Code available at https://github.com/Chao0511/mitune. [...] We perform an extensive experimental campaign using a recent T2I benchmark suite (Huang et al., 2023) and SD-2.1-base as base model obtaining sizable improvement compared to six alternative methods ( 4).
Researcher Affiliation Collaboration Chao Wang1,2, Giulio Franzese1, Alessandro Finamore2, Massimo Gallo2, Pietro Michiardi1 EURECOM1, Huawei Technologies SASU, France2 1EMAIL 2EMAIL
Pseudocode Yes Algorithm 1: MI-TUNE [...] Algorithm 2: Point-wise MI Estimation
Open Source Code Yes Code available at https://github.com/Chao0511/mitune.
Open Datasets Yes We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022) [...] we compute the metrics using 30k samples of the MS-COCO-2014 (Lin et al., 2015) validation set.
Dataset Splits Yes We compare all techniques using T2I-Comp Bench (Huang et al., 2023), a benchmark composed of 700/300 (train/test) prompts across 6 categories [...] We also assess MI-TUNE performance on more realistic prompts by sampling 5,000/1,250 (train/test) prompt-image pairs from Diffusion DB (Wang et al., 2022)
Hardware Specification Yes GPUs for Training 1 NVIDIA A100 [...] on a single A100-80GB GPU
Software Dependencies No Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 [...] Optimizer Adam W. This table lists training components and techniques, not specific software library versions like Python, PyTorch, or CUDA, which are required for a "Yes" answer.
Experiment Setup Yes Table 8: Training hyperparameters. Trainable model UNET [...] PEFT Do RA (Liu et al., 2024) Rank 32 α 32 Learning rate (LR) 1e 4 Gradient norm clipping 1.0 LR scheduler Constant LR warmup steps 0 Optimizer Adam W Adam W β1 0.9 Adam W β2 0.999 Adam W weight decay 1e 2 Adam W ϵ 1e 8 Resolution 512 512 Classifier-free guidance scale 7.5 Denoising steps 50 Batch size 400 Training iterations 300