reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Instructing Text-to-Image Diffusion Models via Classifier-Guided Semantic Optimization

Authors: Yuanyuan Chang, Yinghua Yao, Tao Qin, Mengmeng Wang, Ivor Tsang, Guang Dai

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments further demonstrate that our method achieves high levels of disentanglement and strong generalization across different domains of data. ... Table 1 and Table 2 shows quantitative results. We calculate the LPIPS metric (lower is better) [Zhang et al., 2018] for editing different attributes on human face data and species conversion between animal data.
Researcher Affiliation	Collaboration	1MOE Key Laboratory for Intelligent Networks and Network Security, Xi an Jiaotong University 2Center for Frontier AI Research, Agency for Science, Technology and Research, Singapore ... 4Zhejiang University of Technology 5SGIT AI Lab, State Grid Corporation of China
Pseudocode	Yes	Algorithm 1 Training algorithm
Open Source Code	Yes	Code is available at https://github.com/Chang-yuanyuan/CASO.
Open Datasets	Yes	The datasets used include: FFHQ [Karras et al., 2019], AFHQ [Choi et al., 2020], Celeb AHQ [Karras, 2017] and Stanford Cars datasets [Krause et al., 2013].
Dataset Splits	No	With a well-trained classifier, only 100-200 images are needed to train the embedding. ... We calculated the FID metrics [Heusel et al., 2017] for AFHQ Cat and Dog datasets under unconditional reconstruction and guided reconstruction with cat and dog embeddings, respectively.
Hardware Specification	No	No specific hardware details (GPU/CPU models, etc.) used for running experiments are mentioned in the paper.
Software Dependencies	No	The paper mentions using 'Stable Diffusion-v1.51' and 'VGG16 model' but does not provide specific version numbers for software dependencies like programming languages or libraries (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	During training process, L is set to 0.3T for human face and 0.4T for others. During editing, for subtle features like eyebrows, we start to apply our direction from t [0.1T, 0.3T], while for some coarse-grained changes like species, editing at earlier timesteps is required (t [0.8T, 0.9T]). The results we show in the main text are all done with timesteps T = 50. ... The final training objective is as follows: min {ea}K a=1 Ledit + γLrec. (13)