TD-Paint: Faster Diffusion Inpainting Through Time-Aware Pixel Conditioning

Authors: Tsiry MAYET, Pourya Shamsolmoali, Simon Bernard, Eric Granger, Romain HÉRAULT, Clement Chatelain

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results across three datasets show that TD-Paint outperforms state-of-the-art diffusion models while maintaining lower complexity. Github code: https://github.com/Maugrim EP/td-paint
Researcher Affiliation Academia Tsiry Mayet INSA Rouen Normandie, LITIS UR 4108, F-76000 Rouen, France Pourya Shamsolmoali University of York, United Kingdom East China Normal University, China Simon Bernard Universit e Rouen Normandie, LITIS UR 4108, F-76000 Rouen, France Eric Granger LIVIA, Dept. of Systems Engineering, ETS Montreal, Canada Romain H erault Universit e Caen Normandie, CNRS, GREYC UMR6072, F-14000, Caen, France Cl ement Chatelain INSA Rouen Normandie, LITIS UR 4108, F-76000 Rouen, France
Pseudocode Yes Algorithm 1 TD-Paint Generation Process.
Open Source Code Yes Github code: https://github.com/Maugrim EP/td-paint
Open Datasets Yes Our approach is validated using the Celeb A-HQ (Karras et al., 2018) dataset, the Image Net1K (Russakovsky et al., 2015) dataset, and the Places2 dataset (Zhou et al., 2018) at 256x256 resolution.
Dataset Splits No For evaluation, we use 2,824 images from the Celeb A-HQ test set, 5,000 images from Image Net1K, and 2,000 images from Places2. The paper specifies test set sizes but does not provide explicit training/validation splits or percentages.
Hardware Specification Yes Training on Celeb A-HQ is conducted for approximately 150K steps with batch size 64 on 4 A100, for Image Net1K and Places2 for about 200K steps with batch size 128 on 8 A100. We compare the time efficiency of different diffusion approaches working in the pixel space by computing the average time to sample 100 images consecutively on a single V100
Software Dependencies No We modify the implementation of (Dhariwal & Nichol, 2021), maintaining all their hyperparameters. The paper refers to "Torch Metrics (Nicki Skafte Detlefsen et al., 2022) implementation" but does not specify its version, nor does it list specific versions for other key software components like Python or PyTorch.
Experiment Setup Yes Training on Celeb A-HQ is conducted for approximately 150K steps with batch size 64 on 4 A100, for Image Net1K and Places2 for about 200K steps with batch size 128 on 8 A100. For LDM, the encoded masked image and the downsampled mask provide additional context during the sampling process. For Control Net, the encoded masked image alone provides additional context.