reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Leveraging Model Guidance to Extract Training Data from Personalized Diffusion Models

Authors: Xiaoyu Wu, Jiaru Zhang, Steven Wu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on DMs fine-tuned with datasets including Wiki Art, Dream Booth, and real-world checkpoints posted online validate the effectiveness of our method, extracting about 20% of fine-tuning data in most cases. The code is available1. Experimental results on fine-tuned checkpoints on various datasets (Wiki Art, Dream Booth), various DMs and real-world checkpoints from Hugging Face validate the effectiveness of our methods.
Researcher Affiliation	Academia	1Carnegie Mellon University, Pittsburgh, PA 15213, USA 2Purdue University, West Lafayette, IN 47907, USA.
Pseudocode	Yes	Algorithm 1 Hard Prompt Extraction on Fine-tuned Caption for Linear Layers
Open Source Code	Yes	The code is available1. 1https://github.com/Nicholas0228/ Fine Xtract
Open Datasets	Yes	Experiments on DMs fine-tuned with datasets including Wiki Art, Dream Booth, and real-world checkpoints posted online validate the effectiveness of our method... For style-driven generation... we randomly select 20 artists, each with 10 images, from the Wiki Art dataset (Nichol, 2016). For object-driven generation... we experiment on 30 objects from the Dreambooth dataset (Ruiz et al., 2023)...
Dataset Splits	No	The paper specifies the number of images used for fine-tuning (N0) and for generation (N), e.g., 'we randomly select 20 artists, each with 10 images, from the Wiki Art dataset' and 'By default, we set the generation count N to 50 N0, where N0 represents the number of training images.' However, it does not provide explicit training/test/validation splits for a larger dataset that would be used to evaluate the model or the extraction method in a traditional ML sense.
Hardware Specification	Yes	The batch size is fixed at 5, and all experiments are conducted on a single A100 GPU.
Software Dependencies	No	The paper mentions using 'training script provided by Diffusers' but does not specify version numbers for Diffusers or any other software libraries or programming languages.
Experiment Setup	Yes	For Dream Booth, the guidance scale w for both Fine Xtract and CFG set to 3.0 by default, with the correction term scale k set to -0.02 in Equations 8 and 13. For Lo RA, w is set to 5.0 for Fine Xtract and 3.0 for CFG, respectively. By default, the number of training steps is set to 200 N0, with a learning rate of 2 10 6. The batch size is set to 1. The rank is fixed to 64 to ensure the fine-tuning process capture fine-grained details of training samples.