reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance

Authors: Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Humphrey Shi, Yunchao Wei

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive qualitative and quantitative experiments demonstrate that the use of semantic preservation loss effectively improves the compositional abilities of fine-tuning models. Lastly, we also extend our Class Diffusion to personalized video generation, demonstrating its flexibility.
Researcher Affiliation	Collaboration	1 Institute of Information Science, Beijing Jiaotong University 2 Visual Intelligence + X International Joint Laboratory of the Ministry of Education 3 Byte Dance Inc. 4 SHI Labs@Georgia Tech
Pseudocode	Yes	Algorithm 1 Algorithm to Convert Character Set to 2D Point Set
Open Source Code	No	The paper discusses the source code of a third-party tool or platform that the authors used for a baseline method (SVDiff), but does not provide their own implementation code for ClassDiffusion or state that their code is open source.
Open Datasets	Yes	Datasets Following previous work [29, 66, 75], we conduct quantitative experiments on Dream Booth Dataset [66]. It contains 30 objects including both live objects and non-live objects. In addition, we used images from the Textual Inversion Dataset [20] and Custom Concept101 [38] in qualitative experiments.
Dataset Splits	No	The paper mentions using well-known datasets like Dream Booth Dataset, Textual Inversion Dataset, and Custom Concept101, which likely have standard splits defined in their respective original works. However, this paper does not explicitly state the training, validation, or test splits (e.g., percentages, sample counts, or specific methodology) used for these datasets within its own text.
Hardware Specification	Yes	All experiments are conducted on 2 RTX4090 GPUs.
Software Dependencies	Yes	Our method is built on Stable Diffusion V1.5, with a learning rate 10 6, and batch size 2 for fine-tuning. We used 500 optimization steps for a single concept and 800 for multiple concepts, respectively. During inference, the guidance scale is set to 6.0 and the inference steps are set to 100. The semantical preservation loss weight is set to 1.0 during all experiments. All experiments are conducted on 2 RTX4090 GPUs. Our method uses 6 min for the generation of single concepts and 11 min for the generation of multiple concepts. To better preserve the semantic space, we compute SPL between text embeddings embedded in the semantic space of the Stable Diffusion model. Therefore, we utilize the CLIP [61] text encoder from Stable Diffusion v1.5 [63], specifically clip-vit-large-patch14 [47], to extract the text embeddings of phrases. Following common practice, we use the End of Sequence (EOS) token to represent the semantics of embeddings.
Experiment Setup	Yes	Implementation details Our method is built on Stable Diffusion V1.5, with a learning rate 10 6, and batch size 2 for fine-tuning. We used 500 optimization steps for a single concept and 800 for multiple concepts, respectively. During inference, the guidance scale is set to 6.0 and the inference steps are set to 100. The semantical preservation loss weight is set to 1.0 during all experiments.