reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence

Authors: Yinbin Han, Meisam Razaviyayn, Renyuan Xu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of the PI-FT algorithm from Section 3 via numerical experiments, focusing on the following questions: In practice, how fast does the PI-FT algorithm converge to the optimal solution? How does the choice of β affect the convergence rate and the quality of the fine-tuned models? As shown in this section, the PI-FT algorithm converges efficiently to the global optimum; increasing β accelerates convergence and yields a model closer to the pre-trained one, aligning with our theoretical analysis in Section 3. Model Setup. We fine-tune the Stable Diffusion v1.5 (Rombach et al., 2022) for text-to-image generation, using Lo RA (Hu et al., 2022) and Image Reward (Xu et al., 2023). Following (Fan et al., 2024), we use four prompts A green colored rabbit, A cat and a dog, Four wolves in the park, and A dog on the moon to evaluate the model s ability to generate correct color, composition, counting, and location, respectively. During training, we generate 10 trajectories, each consisting of 50 transitions, to calculate the gradient with 1000 gradient steps. By default, we use the Adam W optimizer with a learning rate of 3 10 4 , and set the KL regularization coefficient to a fixed value as β = 0.01.
Researcher Affiliation	Academia	1Department of Finance and Risk Engineering, New York University 2Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California. Correspondence to: Renyuan Xu <EMAIL>.
Pseudocode	Yes	Algorithm 1 Policy Iteration for Fine-Tuning (PI-FT) 1: Input: Expected reward function r( ), pre-trained model {spre t }T t=0, and number of iterations {mt}T 1 t=0 . 2: Set V (m T ) T (y) = r(y) for all y Rd. 3: for t = T 1, . . . , 0 do 4: Set u(0) t (y) = spre t (y). 5: for m = 1, . . . , mt 1 do 6: Update the control using u(m+1) t (y) = αtσ2 t (1 αt)βt E h V (mt+1) t+1 y(m) i + spre t (y). (18) where y(m) = 1 αt (y +(1 αt)u(m) t (y))+σt Wt 7: end for 8: Compute the value function V (mt) t using V (mt) t (y) = E V (mt+1) t+1 y(mt) + ℓ=t βℓ (1 αℓ)2 2αℓσ2 ℓ u(mt) t (y) spre t (y) 2 2 ! . 10: return n u(mt) t o T 1 t=0 and n V (mt) t o T
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a direct link to a code repository. Citations to third-party tools or general project overviews are not sufficient.
Open Datasets	No	The paper mentions fine-tuning Stable Diffusion v1.5 and evaluating with Image Reward. While these are well-known models/metrics, the paper describes using "a small sample set with human feedback" for fine-tuning without providing concrete access information (link, DOI, specific citation with author/year for the dataset itself) for this particular dataset used in their experiments. Therefore, the dataset used for fine-tuning their specific experiments is not made publicly accessible.
Dataset Splits	No	The paper states, "During training, we generate 10 trajectories, each consisting of 50 transitions, to calculate the gradient with 1000 gradient steps." This describes the generation process and training steps rather than a specific division of a pre-existing dataset into training, validation, or testing splits.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU types, or cloud computing instance specifications used for running the experiments. It only mentions fine-tuning Stable Diffusion v1.5, which is a software model.
Software Dependencies	No	The paper mentions using the "AdamW optimizer" and references "Stable Diffusion v1.5" and "LoRA", but it does not specify any software libraries (e.g., PyTorch, TensorFlow) or their version numbers that would be required to reproduce the experiments. The optimizer name alone is not sufficient.
Experiment Setup	Yes	During training, we generate 10 trajectories, each consisting of 50 transitions, to calculate the gradient with 1000 gradient steps. By default, we use the Adam W optimizer with a learning rate of 3 10 4 , and set the KL regularization coefficient to a fixed value as β = 0.01. For a fair comparison, we configure DPOK to perform 10 gradient steps per sampling step, using a learning rate of 1 10 5. Each gradient step is computed using 50 randomly sampled transitions from a replay buffer.