reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Control and Realism: Best of Both Worlds in Layout-to-Image without Training

Authors: Bonan Li, Yinhan Hu, Songhua Liu, Xinchao Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that Win Win Lay excels in controlling element placement and achieving photorealistic visual fidelity, outperforming the current state-of-the-art methods.
Researcher Affiliation	Academia	1University of Chinese Academy of Sciences, Beijing, China 2National University of Singapore, Singapore.
Pseudocode	No	The paper includes mathematical equations and descriptions of the method, but it does not contain a clearly labeled pseudocode block or algorithm figure.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code, nor does it include a link to a code repository.
Open Datasets	Yes	Akin to prior work (Chen et al., 2024d), we quantitatively evaluate our Win Win Lay on COCO2014 (Lin et al., 2014) and Flickr30K (Plummer et al., 2015).
Dataset Splits	No	The paper states that evaluation is done on COCO2014 and Flickr30K akin to prior work, but it does not explicitly provide specific training/test/validation split percentages, sample counts, or detailed splitting methodology within the text.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments.
Software Dependencies	No	The paper mentions using "Stable Diffusion 1.5" and tools like "YOLOv7" and "CLIP-s", but it does not list specific software dependencies like programming languages (e.g., Python), libraries (e.g., PyTorch, TensorFlow), or their respective version numbers, which are essential for reproducibility.
Experiment Setup	Yes	We adopt the Stable Diffusion 1.5 (Rombach et al., 2022), pre-trained on the LAION5B (Schuhmann et al., 2022a), as our base Text-to-Image model. During generation, we employ the DDIM sampler with 50 steps and set the scale guidance to 7.5 for generation. Since layout construction typically occurs during the early stages of denoising, we apply the layout constraint only within the initial 10 steps. The hyperparameters ρ of non-local attention prior is set to 5/0 for max/min, respectively. For adaptive update, we set steps O of Langevin dynamics is set as 4 and signal-to-noise ratio r as 0.06.