HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models
Authors: Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, Humphrey Shi
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that HD-Painter surpasses existing state-of-the-art approaches quantitatively and qualitatively across multiple metrics and a user study. Section 4 is dedicated to "EXPERIMENTS" which includes "QUANTITATIVE AND QUALITATIVE ANALYSIS" and "ABLATION STUDY". |
| Researcher Affiliation | Collaboration | The authors are affiliated with 1Picsart AI Research (PAIR), 2UT Austin, and 3Georgia Tech. Picsart AI Research (PAIR) is an industry affiliation, while UT Austin and Georgia Tech are academic institutions, indicating a collaborative affiliation. |
| Pseudocode | No | The paper describes methods using mathematical formulas and descriptive text, but it does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is publicly available at: https://github.com/Picsart-AI-Research/HD-Painter. |
| Open Datasets | Yes | We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014). |
| Dataset Splits | Yes | We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions models like "Stable Diffusion 1.5, Stable Diffusion 2.0 and Dreamshaper-8" and refers to "Open CV (Itseez, 2015)" but does not provide specific version numbers for key software libraries or dependencies. |
| Experiment Setup | Yes | PAInt A is used to replace the self attention layers on the H/32 W/32 and H/16 W/16 resolutions for the first half of generation steps. For RASG we select only cross-attention similarity matrices of the H/32 W/32 resolution since utilizing higher resolutions did not offer significant improvements. For hyperparameters {σt}T t=1 we chose (1 αt 1)/(1 αt) p 1 αt/αt 1, η = 0.15. We evaluate the methods on a random sample of 10000 (image, mask, prompt) triplets from the validation set of MSCOCO 2017 (Lin et al., 2014). |