HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models

Authors: Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Section 4 is dedicated to "EXPERIMENTS" which includes "QUANTITATIVE AND QUALITATIVE ANALYSIS" and "ABLATION STUDY".
Researcher Affiliation Collaboration The authors are affiliated with 1Picsart AI Research (PAIR), 2UT Austin, and 3Georgia Tech. Picsart AI Research (PAIR) is an industry affiliation, while UT Austin and Georgia Tech are academic institutions, indicating a collaborative affiliation.
Pseudocode No The paper describes methods using mathematical formulas and descriptive text, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter.
Open Datasets Yes We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014).
Dataset Splits Yes We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014).
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions models like "Stable Diffusion 1.5, Stable Diffusion 2.0 and Dreamshaper-8" and refers to "Open CV (Itseez, 2015)" but does not provide specific version numbers for key software libraries or dependencies.
Experiment Setup Yes PAInt A is used to replace the self attention layers on the H/32 W/32 and H/16 W/16 resolutions for the first half of generation steps. For RASG we select only cross-attention similarity matrices of the H/32 W/32 resolution since utilizing higher resolutions did not offer significant improvements. For hyperparameters {σt}T t=1 we chose (1 αt 1)/(1 αt) p 1 αt/αt 1, η = 0.15. We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014).