reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Visual Generation Without Guidance

Authors: Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, Jun Zhu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments across five distinct visual models demonstrate the effectiveness and versatility of GFT. Across domains of diffusion, autoregressive, and masked-prediction modeling, GFT consistently achieves comparable or even lower FID scores, with similar diversity-fidelity trade-offs compared with CFG baselines, all while being guidance-free.
Researcher Affiliation	Collaboration	1Department of Computer Science & Technology, Tsinghua University 2Sheng Shu, Beijing, China. Correspondence to: Jun Zhu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Guidance-Free Training (Diffusion)
Open Source Code	Yes	Code: https://github.com/thu-ml/GFT.
Open Datasets	Yes	We train C2I models on Image Net-256x256 (Deng et al., 2009). For T2I models, we use a subset of the LAION-Aesthetic 5+ (Schuhmann et al., 2022), consisting of 18 million image-text pairs. Our codebases are directly modified from the official CFG implementation of each respective baseline, keeping most hyperparameters consistent with CFG training. We use official OPENAI evaluation scripts to evaluate our C2I models. For T2I models, we evaluate our model on zero-shot COCO 2014 (Lin et al., 2014).
Dataset Splits	Yes	For evaluation, following Giga GAN (Kang et al., 2023) and DMD (Yin et al., 2024), we generate images using 30K prompts from the COCO2014 (Lin et al., 2014) validation set, downsample them to 256 × 256, and compare with 40,504 real images from the same validation set.
Hardware Specification	Yes	We use 8 × 80GB H100 GPU cards. (Table 1 caption) We employ a mix of H100, A100 and A800 GPU cards for experimentation. (Appendix D)
Software Dependencies	No	The paper mentions software like "Dpm-solver++ (Lu et al., 2022)" and refers to official codebases for baselines, but does not specify version numbers for any libraries or programming languages.
Experiment Setup	Yes	For all models, we keep training hyperparameters and other design choices consistent with their official codebases if not otherwise stated. We employ a mix of H100, A100 and A800 GPU cards for experimentation. Di T. We mainly apply GFT to fine-tune Di T-XL/2 (28 epochs, 2% of pretraining epochs) and train Di T-B/2 from scratch (80 epochs, following the original Di T paper’s settings (Peebles & Xie, 2023)). Since the Di T-B/2 pretraining checkpoint is not publicly available, we reproduce its pretraining experiment. For all experiments, we use a batch size of 256 and a learning rate of 1e-4. For Di T-XL/2 fine-tuning experiments, we employ a cosine-decay learning rate scheduler. ... (and similar details for VAR, Llama Gen, MAR, and Stable Diffusion 1.5 in Appendix D)