reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Authors: Xierui Wang, Siming Fu, Qihan Huang, Wanggui He, Hao Jiang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive quantitative and qualitative experiments affirm that this method surpasses existing models in both image and text fidelity, promoting the development of personalized text-to-image generation. The experimental results demonstrate our method consistently outperforms the state-of-the-art approaches on all the benchmarks. We conclude the previous P-T2I works and provide an overall comparison in Table 1. The paper further delineates comprehensive ablation studies, underpinning the rationale behind our design decisions and affirming the efficacy of our proposed approach. For training, we utilize an in-house video dataset that contains 3.6M video clips. For evaluation, we measure the single-subject and multi-subject performance on Dream Bench (Ruiz et al., 2023) and MS-Bench, respectively.
Researcher Affiliation	Collaboration	Xierui Wang2,+ Siming Fu2,+ Qihan Huang2 Wanggui He1 Hao Jiang1, 1Alibaba Group 2Zhejiang University +Equal contribution Corresponding author
Pseudocode	No	The paper describes methods like Grounding Resampler and Multi-subject Cross-attention using mathematical formulations and textual descriptions, but it does not include any clearly labeled pseudocode blocks or algorithms.
Open Source Code	No	The project page is https://MS-Diffusion.github.io. This is a project page, not a direct link to a code repository. The paper does not explicitly state that the code is released or provide a specific code repository link.
Open Datasets	Yes	For evaluation, we measure the single-subject and multi-subject performance on Dream Bench (Ruiz et al., 2023) and MS-Bench, respectively.
Dataset Splits	No	The paper mentions using an in-house video dataset for training (3.6M video clips) and Dream Bench and MS-Bench for evaluation. It describes the composition of MS-Bench with '1148 combinations and 4488 evaluation samples'. However, it does not explicitly provide specific training/test/validation splits (e.g., percentages or exact counts) for any of these datasets in a reproducible manner. While Dream Bench is a known dataset, its splits are not detailed in this paper, and for MS-Bench, only its evaluation sample count is given, not a full split for reproduction.
Hardware Specification	Yes	Implemented by Pytorch 2.0.1 and Diffusers 0.23.1, our model is trained on 16 A100 GPUs for 120k steps with a batch size of 8 and a learning rate of 1e-4.
Software Dependencies	Yes	Implemented by Pytorch 2.0.1 and Diffusers 0.23.1, our model is trained on 16 A100 GPUs for 120k steps with a batch size of 8 and a learning rate of 1e-4.
Experiment Setup	Yes	The pre-trained model employed in MS-Diffusion is Stable Diffusion XL (SDXL) (Podell et al., 2023). Implemented by Pytorch 2.0.1 and Diffusers 0.23.1, our model is trained on 16 A100 GPUs for 120k steps with a batch size of 8 and a learning rate of 1e-4. Following the training of IP-adapter (Ye et al., 2023), we set γ = 1.0 in cross-attention layers and dropped the text and image condition using the same probability. To ensure the model is not dependent on the grounding tokens (Section 3.4), we also randomly drop them with a probability of 0.1. We generate five images for each sample during the inference, with unconditional guidance scale and γ set to 7.5 and 0.6, respectively, to get better results.