reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization

Authors: Jiajun Fan, Shuaike Shen, Chaoran Cheng, Yuxin Chen, Chumeng Liang, Ge Liu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on tasks including target image generation, image compression, and text-image alignment demonstrate the effectiveness of our method, where our method achieves optimal policy convergence while allowing controllable trade-offs between reward maximization and diversity preservation.
Researcher Affiliation	Academia	Jiajun Fan1 , Shuaike Shen2 , Chaoran Cheng1, Yuxin Chen3 , Chumeng Liang4 , Ge Liu1 1 University of Illinois Urbana-Champaign, 2 Zhejiang University. 3 Tsinghua University, 4 University of Southern California. 1 EMAIL 2,3,4 EMAIL
Pseudocode	Yes	More experimental details can be found in App. D and E with our Pseudocode in App. H.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of their own implementation code. It states that their code is built upon existing open-source codebases like Torch CFM and Diffusers, but does not confirm their specific modifications or additions are publicly available.
Open Datasets	Yes	At first, we evaluate our ORW-CFM method, with and without W2 regularization, by fine-tuning a pre-trained model of MNIST (Le Cun et al., 1998) to generate only even numbers using reward signals. For the image compression task, we followed the reward function from DDPO (Black et al., 2024), where the goal was to either minimize the file size of the images after JPEG compression. The reward r(x) is proportional to the compression rate. We controlled the balance between compression and diversity using the regularization parameter α, which induces varying degrees of divergence between the fine-tuned model and the reference model. A lower α prioritizes compressibility, leading the model to generate images that occupy minimal storage space after compression, while higher α increases diversity in the generated outputs.
Dataset Splits	No	The paper mentions using MNIST and CIFAR-10 datasets and that they utilized existing unconditional flow matching training baselines from Torch CFM. However, it does not explicitly state the specific training, validation, or test splits (e.g., percentages or exact counts) used for these datasets in the main text or appendices.
Hardware Specification	Yes	All experiments were conducted on a system with one worker equipped with an 8-core CPU and, an NVIDIA A10 GPU, and memory of 32 GB.
Software Dependencies	No	The paper mentions several software components like Torch CFM, Diffusers, U-Net, and LoRA, and also specifies fp16 precision. However, it does not provide specific version numbers for these software dependencies, which are required for a reproducible description.
Experiment Setup	Yes	We have conducted detailed ablation experiments in the main text and appendix on the two most important hyperparameters in our method: the entropy control parameter τ and the W2 distance regularization parameter α. For all experiments, we use a batch size of 64 and employ a separate test process to evaluate our performance.