reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

TP-Blend: Textual-Prompt Attention Pairing for Precise Object-Style Blending in Diffusion Models

Authors: Xin Jin, Yichuan Zhong, Yapeng Tian

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments 4.1 Implementation Details Model Architecture. All experiments employ SD-XL Podell et al. (2023) as the diffusion backbone. 4.2 Comparisons with SOTA models Quantitative Evaluation of Object Replacement and Blending. Table 1 presents BOM scores for 800 replacement blend pairs. 4.3 Ablation Study Ablation Study on CAOF. To examine how CAOF controls the fusion strength, we vary the blending coefficient w0 [0.1, 0.9] (Eq. 9) and record the CLIP similarities for the original (O), replaced (R), and blend (B) prompts.
Researcher Affiliation	Collaboration	Xin Jin EMAIL Gen Pi Inc. Yichuan Zhong EMAIL Gen Pi Inc. Yapeng Tian EMAIL The University of Texas at Dallas
Pseudocode	No	The paper describes the methods, CAOF and SASF, using prose, mathematical equations (Eq. 1-19), and flowcharts (Figure 4 and Figure 6). It does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code Availability. Code is available at https://github.com/felixxinjin1/TP-Blend.
Open Datasets	Yes	For our evaluation, we assembled a diverse set of high-resolution, publicly available images from Unsplash1, following the same practice as prior work such as SLIDE Jampani et al. (2021) and Text-driven Image Editing via Learnable Regions Lin et al. (2024). The test dataset consists of 4,000 samples, created by pairing 40 base images with 20 distinct replace-blend object combinations and 5 distinct blend styles. 1https://unsplash.com/
Dataset Splits	Yes	The test dataset consists of 4,000 samples, created by pairing 40 base images with 20 distinct replace-blend object combinations and 5 distinct blend styles.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions using "SD-XL Podell et al. (2023) as the diffusion backbone," which refers to a model, not the hardware it ran on.
Software Dependencies	No	The paper mentions using "SD-XL Podell et al. (2023) as the diffusion backbone" and algorithms like "Classifier-Free Guidance (CFG)", "DDIM inversion Song et al. (2020)", "Optimal Transport", and the "Sinkhorn algorithm Cuturi (2013); Peyré et al. (2019); Genevay et al. (2016)". However, it does not provide specific version numbers for any software libraries, programming languages, or other ancillary software dependencies used for implementation.
Experiment Setup	Yes	During the forward denoising pass we apply, at every timestep: (i) TIE-CFG for object replacement (positive guidance on the target prompt, negative on the original); (ii) CAOF to transport blend-object features into attention positions selected by the joint percentile thresholds τsource = τdest {0.6, 0.7}; and (iii) SASF to inject style via DSIN and key-value substitution. The Sinkhorn regulariser is fixed to γ = 0.1, with cost weights λfeature = 0.7 and λspatial = 0.3 (Eq. 10).