\copyright Plug-in Authorization for Human Copyright Protection in Text-to-Image Model

Authors: Chao Zhou, Huishuai Zhang, Jiang Bian, Weiming Zhang, Nenghai Yu

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in artist-style replication and cartoon IP recreation demonstrate plug-ins effectiveness, offering a valuable solution for human copyright protection in the age of generative AIs.
Researcher Affiliation Collaboration Chao Zhou EMAIL University of Science and Technology of China, Huishuai Zhang EMAIL Wangxuan Institute of Computer Technology, Peking University National Key Laboratory of General Artificial Intelligence, Jiang Bian EMAIL Microsoft Research
Pseudocode Yes Algorithm 1 describes concrete steps of optimizing the objective (5). Algorithm 1: Combination: Easy Merge method Input: A set S of indices of plug-ins to be combined, base model w, diffusion step T Output: Combined Lo RA w L
Open Source Code Yes The code is available at https://github.com/zc1023/-Plug-in-Authorization.git
Open Datasets Yes We utilize 5000 textual captions selected from the validation set in MS-COCO (Lin et al., 2015) as prompts, generating 5000 images using SD1.5 and the non-infringing model that extracts R2D2 and Picasso, respectively.
Dataset Splits Yes We utilize 5000 textual captions selected from the validation set in MS-COCO (Lin et al., 2015) as prompts, generating 5000 images using SD1.5 and the non-infringing model that extracts R2D2 and Picasso, respectively. In extraction, we sample 10 common contents (training set) leveraging Chat GPT to fine-tune the base model. These contents have been previously processed by the non-infringing model. Additionally, we generate 10 supplementary contents for evaluation. Figure 9 shows the images generated with these 20 contents. The images on the left represent the seen contents (training set), while those on the right are the unseen contents (evaluation set).
Hardware Specification No The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. No explicit hardware environment is mentioned.
Software Dependencies Yes In all experiments, we fine-tune the attention component in the U-Net architecture of Stable Diffusion Model v1.5, as described in Rombach et al. (2022a).
Experiment Setup Yes For both the de-concept process and the re-context process, the training consists of 10 iterations, with each iteration 30 epochs. We use a learning rate of 1.5e-4, T = 50 steps for the diffusion process, and a rank of 40 for Lo RA. For the combination operation, we use a learning rate of 1e-3 and a rank value of 140 for Lo RA.