reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Efficient Distillation of Classifier-Free Guidance using Adapters

Authors: Cristian Perez Jensen, Seyedmorteza Sadat

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models (≈ 2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method. Setup We evaluate AGD on class-conditional generation using 256 × 256 Di T-XL/2 (Peebles & Xie, 2023), and text-to-image generation using 768 × 768 Stable Diffusion 2.1 (SD2.1) (Rombach et al., 2022) and 1024 × 1024 Stable Diffusion XL (SDXL) (Podell et al., 2024).
Researcher Affiliation	Academia	Cristian Perez Jensen EMAIL ETH Zürich Seyedmorteza Sadat EMAIL ETH Zürich
Pseudocode	Yes	Algorithm 1 Trajectory collection for AGD. Algorithm 2 Adapter training for AGD.
Open Source Code	No	We will publicly release the implementation of our method.
Open Datasets	Yes	Image Net (Deng et al., 2009). For text-to-image models, we randomly select 500 captions from the COCO-2017 training set (Lin et al., 2014)
Dataset Splits	Yes	For training adapters on Di T, trajectories are sampled with guidance scales ω ∼ Unif([1, 6]), with four trajectories per class label of Image Net (Deng et al., 2009). For text-to-image models, we randomly select 500 captions from the COCO-2017 training set (Lin et al., 2014), generating a single trajectory per caption with guidance scales ω ∼ Unif([1, 12]). ... The FID scores for class-conditional models were computed using 10k generated samples and the entire Image Net training set. For text-to-image models, we used the full COCO-2017 validation set as the real data.
Hardware Specification	Yes	All experiments are conducted on a single RTX 4090 GPU (24 GB of VRAM).
Software Dependencies	No	The paper mentions the Adam optimizer and other techniques but does not specify software versions for libraries like PyTorch, TensorFlow, CUDA, etc.
Experiment Setup	Yes	Training is performed using the Adam optimizer (Kingma & Ba, 2014) without weight decay, where the learning rate follows a linear warm-up to 1 × 10−4 over the first 10% of steps, after which it decays via a cosine annealing schedule (Loshchilov & Hutter, 2016). For training adapters on Di T, trajectories are sampled with guidance scales ω ∼ Unif([1, 6]), with four trajectories per class label of Image Net (Deng et al., 2009). ... The Di T-XL/2 model was trained with a batch size of 64 for 5000 gradient steps, the SD2.1 model with a batch size of 8 for 5000 gradient steps, and the SDXL model with a batch size of 1 for 20000 gradient steps. These settings were selected based on the maximum batch size that fits within 24,GB of VRAM.