reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA

Authors: James Seale Smith, Yen-Chang Hsu, Lingyu Zhang, Ting Hua, Zsolt Kira, Yilin Shen, Hongxia Jin

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we start by analyzing existing customized diffusion methods in the popular Stable Diffusion model (Rombach et al., 2022), showing that these models catastrophically fail for sequentially arriving fine-grained concepts (we specifically use human faces and landmarks). ... We show that C-Lo RA not only outperforms several baselines for our proposed setting of text-to-image continual customization, which we refer to as Continual Diffusion, but that we achieve a new state-of-the-art in the well-established rehearsal-free continual learning setting for image classification. ... Qualitative results are shown in Figure 4 showing samples from task 1, 6, and 10 after training all 10 tasks, while quantitative results are given in Table 1.
Researcher Affiliation	Collaboration	James Seale Smith EMAIL Samsung Research America Georgia Institute of Technology Yen-Chang Hsu Samsung Research America Lingyu Zhang Samsung Research America Ting Hua Samsung Research America Zsolt Kira Georgia Institute of Technology Yilin Shen Samsung Research America Hongxia Jin Samsung Research America
Pseudocode	No	The paper describes the C-Lo RA method, self-regularization, and customized token strategy in sections 3.1, 3.2, and 3.3 respectively, using mathematical formulas and descriptive text. However, it does not present these as a clearly labeled pseudocode or algorithm block.
Open Source Code	No	The paper states: "For additional analysis on the efficacy for Lo RA (Hu et al., 2021) for text-to-image diffusion, we suggest the implementation by Ryu, which was a concurrent project to ours." This refers to a third-party implementation and not the authors' own source code for the methodology described in the paper.
Open Datasets	Yes	We first benchmark our method using the 512x512 resolution (self-generated) celebrity faces dataset, Celeb Faces Attributes (Celeb-A) HQ (Karras et al., 2017; Liu et al., 2015). ... As an additional dataset, we demonstrate the generality of our method and introduce an additional dataset with a different domain, benchmarking on a 10 length task sequence using the Google Landmarks dataset v2 (Weyand et al., 2020). ... We benchmark our approach using Image Net-R (Hendrycks et al., 2021; Wang et al., 2022b).
Dataset Splits	No	For the Celeb-A HQ dataset, the paper states: "We sample 10 celebrities at random which have at least 15 individual training images each. Each celebrity customization is considered a task". For Image Net-R, it mentions "10 tasks (20 classes per task)", "5-task", and "20 task" sequences. While these describe how tasks are defined or sampled, they do not provide specific training, validation, and test split percentages, absolute sample counts, or explicit references to predefined splits within each task for direct reproduction.
Hardware Specification	Yes	We use 2 A100 GPUs to generate all results.
Software Dependencies	No	We implement our method and all baselines in Py Torch(Paszke et al., 2019). While PyTorch is mentioned, no specific version number is provided, nor are any other software dependencies with version numbers.
Experiment Setup	Yes	For the most part, we use the same implementation details as Custom Diffusion (Kumari et al., 2022) with 2000 training iterations... For Lo RA, we searched for the rank using a simple exponential sweep and found that a rank of 16 sufficiently learns all concept. Additional training details are located in Appendix C. ... We found a learning rate of 5e 6 worked best for all non-Lo RA methods, and a learning rate of 5e 4 worked best for our Lo RA methods. We found a loss weight of 1e6 and 1e8 worked best for EWC (Kirkpatrick et al., 2017) and C-Lo RA respectively. ... We found a rank of 16 was sufficient for Lo RA for the text-to-image experiments, and 64 for the image classification experiments. ... All images are generated with a 512x512 resolution, and we train for 2000 steps on the face datasets and 500 steps on the waterfall datasets.