reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Local Conditional Controlling for Text-to-Image Diffusion Models

Authors: Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Zheng Yang, Xiaofei He, Wei Zhao, Qinglin Lu, Wei Liu, Boxi Wu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that our method can synthesize high-quality images aligned with the text prompt under local control conditions. 4 Experiments 4.1 Dataset and Evaluation 4.2 Comparison with Baselines
Researcher Affiliation	Collaboration	Yibo Zhao1,2, Liang Peng2, Yang Yang1,2, Zekai Luo1,2, Hengjia Li1,2, Yao Chen2,3 Zheng Yang2, Xiaofei He1,2, Wei Zhao4, Qinglin Lu5, Wei Liu5, Boxi Wu3* 1State Key Lab of CAD&CG, Zhejiang University 2Fabu Inc 3The School of Software Technology, Zhejiang University 4Xidian University 5Tencent Inc
Pseudocode	No	The paper describes methods in prose and mathematical equations (e.g., equations 1-10) but does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We utilized the COCO(Lin et al. 2014) validation set with 80 object categories, selecting one random caption per image to create a dataset of 5k generated images.
Dataset Splits	No	The paper mentions using the COCO validation set and creating an 'Attend-Condition dataset' but does not specify training/test/validation splits (percentages, counts, or explicit methodology) for the experimental evaluation.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions several models and frameworks (e.g., Stable Diffusion, CLIP, BLIP-2), but it does not specify any software libraries or their version numbers that are critical for reproducing the experiments.
Experiment Setup	Yes	Our initial objective in local control is to identify the most suitable object for generation within the control region at timestep t. The resulting object token indices are denoted as Ct control. In our method, the sum of attention scores within the local control region is employed as the criterion. At denoising steps t > βT, we identify the object with the highest sum attention score within the local control region as the Ct control. β is a hyperparameter that acts on the total timesteps T. ...A β between 0.8 to 0.9 yields good results.