reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Reward as Condition for Instruction-based Image Editing

Authors: Xin Gu, Ming Li, Libo Zhang, Fan Chen, Longyin Wen, Tiejian Luo, Sijie Zhu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments indicate that our multi-reward conditioned model outperforms its no-reward counterpart on two popular editing pipelines, i.e., Ins Pix2Pix and Smart Edit. Extensive experiments showing that the proposed method can be combined with existing editing models with a significant performance boost on all three perspectives, achieving state-of-the-art performance for both GPT-4o and human evaluation.
Researcher Affiliation	Collaboration	Xin Gu1,2 Ming Li1 Libo Zhang2,3 Fan Chen1 Longyin Wen1 Tiejian Luo2 Sijie Zhu1, 1Byte Dance Inc. 2University of Chinese Academy of Sciences 3Institute of Software Chinese Academy of Sciences EMAIL
Pseudocode	No	The paper describes methods using mathematical equations and structured steps in paragraph text within Section 4 ('METHODOLOGY'), but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is released at https://github.com/bytedance/Multi-Reward-Editing.
Open Datasets	Yes	The most widely used Ins Pix2Pix (Brooks et al., 2023) dataset is created with a pretrained text-to-image Stable Diffusion (SD) model (Rombach et al., 2022)... We first carefully selected 80 high-quality images from the Unsplash website as the original images... Code is released at https://github.com/bytedance/Multi-Reward-Editing. (The Unsplash dataset link is provided within the references implicitly: https://github.com/unsplash/datasets)
Dataset Splits	Yes	First, we randomly selected 20K training triplets from the Ins Pix2Pix dataset, where each triplet contains an original image, an edited image, and an editing instruction. To evaluate the editing models on real-world photos and diverse instructions covering major 7 categories (defined in Sec. 5), we create an evaluation set with 80 high-quality Unsplash (uns) photos and 560 challenging instructions, which are initially generated by GPT-4o and verified by human annotators.
Hardware Specification	No	No specific hardware details such as GPU model, CPU, or memory were provided for running the experiments.
Software Dependencies	No	Our method is implemented in Python using Py Torch. This statement does not include specific version numbers for Python, PyTorch, or any other libraries used.
Experiment Setup	Yes	During training, we only optimize the MRC module, the U-Net module, the reward encoder, and the connected linear layers. And we use the Adam (Kingma, 2014) optimizer with an initial learning rate of 5e 5, a weight decay of 1e 2, and a warm-up ratio of 0. We resize the images to 256 and apply random cropping during training and resize the shorter side to 512 during inference.