reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting

Authors: Liyao Jiang, Negar Hassanpour, Mohammad Salameh, Mohan Sai Singamsetti, Fengyu Sun, Wei Lu, Di Niu

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive evaluations, we show FRAP generates images with higher or comparable prompt-image alignment to prompts from complex datasets, while having a lower average latency compared to recent latent code optimization methods... We extensively evaluate the faithfulness, overall image quality, and image authenticity of FRAP-generated images via prompt-image alignment metrics, image quality assessment metrics, and an image authenticity metric.
Researcher Affiliation	Collaboration	1Department of Electrical and Computer Engineering, University of Alberta 2Huawei Technologies Canada 3Huawei Kirin Solution, China EMAIL EMAIL EMAIL
Pseudocode	Yes	In Algorithm 1, we provide the detailed algorithm of our proposed FRAP method.
Open Source Code	Yes	We release the code at the following link: https://github.com/Liyao Jiang1998/FRAP/.
Open Datasets	Yes	We evaluate on three Simple, manually crafted prompt datasets from A&E: Animal-Animal (S-AA), Color Object (S-CO), and Animal-Object (S-AO); and five Complex datasets from D&B: Animal-Scene (C-AS), Color-Object-Scene (C-COS), Multi-Object (C-MO), COCO-Attribute (C-CA), and COCO-Subject (C-CS)... We adopt the validation set of the MS-COCO dataset (Lin et al., 2014)... We refer to this dataset as COCO-5K and will release this dataset to facilitate reproducibility and further research.
Dataset Splits	No	The paper uses various prompt datasets (Animal-Animal, Multi-Object, Animal-Object, Color-Object, Animal-Scene, COCO-Attribute, COCO-Subject, Color-Obj-Scene, COCO-5K, Draw Bench, ABC-6K) for evaluation but does not specify training/test/validation splits for any model trained or fine-tuned within the scope of this work. For instance, for COCO-5K, it describes how a subset was sampled: 'This filtering process selects 16k most relevant prompts from the original 40k prompts in MS-COCO, and we randomly sample a 5k subset from the 16k most relevant prompts.'
Hardware Specification	Yes	Our reported latency measures the average wall-clock time for generating one image on each dataset in seconds with a V100 GPU.
Software Dependencies	No	The paper mentions using 'Stable Diffusion 1.5' as the base model, 'FP16 precision', 'PNDM scheduler', and the 'spaCy language parser'. However, it does not provide specific version numbers for these software components or other key libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	Following Chefer et al. (2023); Li et al. (2023), we use the 16 16 CA units for computing the objective function. The weight of object-modifier binding loss λ = 1. For the optimization in Eq. (9), we use a constant step-size ηt = η = 1. We apply our adaptive prompt weighting method to a subset of time-steps t = T, T 1, ..., tend, where T = 50 and tend = 26. For selecting the initial latent code, we perform 15 steps of inference from t = T = 50 to tselect = 36 with a batch of \|B\| = 4 noisy latent codes sampled from N(0, I). We use CFG guidance scale β = 7.5 and the Gaussian filter kernel size is 3 with a standard deviation of 0.5.