reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Preference Adaptive and Sequential Text-to-Image Generation

Authors: Ofir Nabati, Guy Tennenholtz, Chihwei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, Craig Boutilier

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach for PASTA involves a multi-stage data collection and training process. We first collect multi-turn interaction data from human raters with a baseline LMM. Using this sequential data, as well as large-scale, open source (single-turn) preference data, we train a user simulator. Particularly, we employ an EM-strategy to train user preference and choice models, which capture implicit user preference types in the data. We then construct a new large-scale dataset, which consists of interactions between a simulated user and the LMM. Finally, we leverage this augmented data, encompassing both human and simulated interactions, to train PASTA our value-based RL agent, which presents a sequence of diverse slates of images to a user. PASTA interacts with the user and sequentially refines its generated images to better suit their underlying preferences. We evaluate PASTA using human raters, showing significant improvement compared to baseline methods.
Researcher Affiliation	Industry	1Google Research 2Google Deep Mind. Correspondence to: Ofir Nabati <EMAIL>, Guy Tennenholtz <EMAIL>.
Pseudocode	Yes	Algorithm 1 Mini-Batch Expectation-Maximization User Model Optimization
Open Source Code	No	We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems. Both human and simulation data is open-sourced to support research on multi-turn T2I generation. Link to the dataset: https://www.kaggle.com/datasets/googleai/pasta-data.
Open Datasets	Yes	We also open-source our sequential rater dataset and simulated user-rater interactions to support future research in user-centric multi-turn T2I systems. Both human and simulation data is open-sourced to support research on multi-turn T2I generation. Link to the dataset: https://www.kaggle.com/datasets/googleai/pasta-data. All of our datasets are open-sourced here: https://www.kaggle.com/datasets/googleai/pasta-data.
Dataset Splits	No	First, we assess prediction accuracy of the and Pick-a-Pic testset (Kirstain et al., 2023) and ranking using Spearman’s rank correlation (Spearman, 1961) on the HPS dataset (Wu et al., 2023). Second, we evaluate prompt choice prediction accuracy and cross-turn preference accuracy on our human-rated data.
Hardware Specification	No	The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models. It only mentions the models used (Stable Diffusion XL, Gemini 1.5 Flash, Gemma 2B) but not the underlying hardware.
Software Dependencies	No	The paper mentions optimizers like Adam W and Adafactor, and models like Stable Diffusion XL, Gemini 1.5 Flash, and Gemma 2B, but does not provide specific version numbers for software libraries or frameworks (e.g., Python, PyTorch, TensorFlow) that would be needed for replication.
Experiment Setup	Yes	E.4. Hyperparameters Table 1: Main training hyperparameters Learning rate Cosine annealing scheduler (lr=3e-4, T=10e3) (Loshchilov & Hutter, 2016) Training steps 50e3 Batch size 2048 Update target network phase 256 Optimizer Adam W (Loshchilov & Hutter, 2019) (weight decay = 1e-4) κ1 1 αprior 0.999 Table 2: Fine-tuning hyperparameters Learning rate Cosine annealing scheduler (lr=3e-7, T=10e3) (Loshchilov & Hutter, 2016) Training steps 50e3 Batch size 8 Gradient norm clipping 0.5 Update target network phase 256 Optimizer Adam W (Loshchilov & Hutter, 2019) (weight decay = 1e-2) κ2 0.01 κ3 1 κ4 0.1 αprior 0.999 τmax 3 Table 3: PASTA hyperparameters Learning rate 1e-5 Training steps 1e4 Batch size 128 Optimizer Adafactor (Shazeer & Stern, 2018) (weight decay = 1e-2) Gradient norm clipping 1 Expectile parameter α 0.7 ℓq 651 ℓv 651 L 4 M 4 LC 25 Number of categories 5 H 5 N max w 62