reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RB-Modulation: Training-Free Stylization using Reference-Based Modulation

Authors: Litu Rout, Yujia Chen, Nataniel Ruiz, Abhishek Kumar, Constantine Caramanis, Sanjay Shakkottai, Wen-Sheng Chu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 EXPERIMENTS Metrics: Evaluating stylized synthesis is challenging due to the subjective nature of style, making simple metrics inadequate. We follow a two step approach: ﬁrst using metrics from prior works and then conducting human evaluation. To evaluate prompt-image alignment, we use CLIP-T score (Hertz et al., 2023; Sohn et al., 2023; Wang et al., 2024a) and Image Reward (Xu et al., 2024), which also consider human aesthetics, distortions, and object completeness. When a style description is provided, CLIP-T and Image Reward also capture style alignment. We assess style similarity using DINO (Caron et al., 2021) and content similarity using CLIP-I (Radford et al., 2021) as in prior work (Hertz et al., 2023; Ruiz et al., 2023; Sohn et al., 2023), and highlight their limitations in disentangling style and content performance in evaluation. Given the importance of human evaluation in T2I personalization (Hertz et al., 2023; Sohn et al., 2023; Ruiz et al., 2023; Shah et al., 2023; Jeong et al., 2024), we also conduct a user study though Amazon Mechanical Turk to measure both style and text alignment.
Researcher Affiliation	Collaboration	Litu Rout1,2 Yujia Chen1 Nataniel Ruiz1 Abhishek Kumar3 Constantine Caramanis2 Sanjay Shakkottai2 Wen-Sheng Chu1 1 Google 2 UT Austin 3 Google Deep Mind EMAIL EMAIL
Pseudocode	Yes	Algorithm 1: RB-Modulation (Exact) Algorithm 2: RB-Modulation (Proximal)
Open Source Code	Yes	See project page https://rb-modulation.github.io/ for code and further details. The source code is available on the project page: https://rb-modulation.github.io/.
Open Datasets	Yes	We use style images from Style Aligned benchmark (Hertz et al., 2023) for stylization and content images from Dream Booth (Ruiz et al., 2023) for content-style composition.
Dataset Splits	No	The paper mentions using "Style Aligned benchmark" and "Dream Booth" datasets for evaluation, and also conducts a "user study with 155 participants using 100 styles from the Style Aligned dataset... collecting a total of 7,200 answers". However, as the method is training-free, it does not involve traditional train/validation/test splits for model training. The paper specifies the datasets used for evaluation but does not detail how these datasets were split into training, validation, or testing sets in a manner typically required to reproduce a model's learning process. For example, it does not provide percentages or sample counts for these splits.
Hardware Specification	Yes	All experiments run on a single A100 NVIDIA GPU.
Software Dependencies	No	The paper mentions using "Stable Cascade (Pernias et al., 2024)" as a base model and components like "CLIP text encoder (Radford et al., 2021)", "CSD image encoder (Somepalli et al., 2024)", and "Lang SAM". While these are specific tools or models, the paper does not provide version numbers for general software dependencies such as Python, PyTorch, CUDA, or other libraries that would be necessary for full reproducibility.
Experiment Setup	Yes	Our method introduces only two hyper-parameters: stepsize η and optimization steps M in Algorithm 1. We use DDIM sampling with η = 0.1 and M = 3 for all the experiments.