Diff-Shadow: Global-guided Diffusion Model for Shadow Removal

Authors: Jinting Luo, Ru Li, Chengzhi Jiang, Xiaoming Zhang, Mingyan Han, Ting Jiang, Haoqiang Fan, Shuaicheng Liu

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments on datasets ISTD, ISTD+, and SRD have demonstrated the effectiveness of Diff Shadow. Compared to state-of-the-art methods, our method achieves a significant improvement in terms of PSNR, increasing from 32.33d B to 33.69d B on the ISTD dataset. Quantitative Evaluation. Table 1 shows the quantitative results on the test sets over ISTD, SRD, and ISTD+, respectively. Ablation Studies
Researcher Affiliation Collaboration Jinting Luo1, Ru Li2, Chengzhi Jiang1, Xiaoming Zhang3, Mingyan Han1, Ting Jiang1, Haoqiang Fan1, Shuaicheng Liu4* 1Megvii Technology Inc. 2Harbin Institute of Technology 3Southwest Jiaotong University 4University of Electronic Science and Technology of China
Pseudocode Yes Algorithm 1: Diffusive shadow removal model training Algorithm 2: Global-guided diffusive image restoration
Open Source Code Yes Code https://github.com/MonteCarluo/Diff-Shadow
Open Datasets Yes We used three standard datasets: 1) ISTD dataset (Wang, Li, and Yang 2018) includes 1,330 training and 540 testing triplets (shadow images, masks and shadowfree images); 2) Adjusted ISTD (ISTD+) dataset (Le and Samaras 2019) reduces the illumination inconsistency between the shadow and shadow-free image of ISTD; 3) SRD dataset (Qu et al. 2017) consists of 2,680 training and 408 testing pairs of shadow and shadow-free images without the shadow masks.
Dataset Splits Yes We used three standard datasets: 1) ISTD dataset (Wang, Li, and Yang 2018) includes 1,330 training and 540 testing triplets (shadow images, masks and shadowfree images); 2) Adjusted ISTD (ISTD+) dataset (Le and Samaras 2019) reduces the illumination inconsistency between the shadow and shadow-free image of ISTD; 3) SRD dataset (Qu et al. 2017) consists of 2,680 training and 408 testing pairs of shadow and shadow-free images without the shadow masks.
Hardware Specification Yes Our Diff-Shadow is trained using eight NVIDIA GTX 2080Ti GPUs.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'sinusoidal positional encoding', but does not provide specific version numbers for these or any other software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes Our Diff-Shadow is trained using eight NVIDIA GTX 2080Ti GPUs. The Adam optimizer (Kingma and Ba 2015) is applied to optimize the parallel UNets with a fixed learning rate of lr = 2e 4 without weight decay and the training epoch is set as 2,000. The exponential moving average (Nichol and Dhariwal 2021) with a weight of 0.999 is applied for parameter updating. For each dataset, the images are resized into 256 256 for training. For each iteration, we initially sampled 8 images from the training set and randomly cropped 16 patches with size 64 64 from each image plus a down-sampled global image with the same size corresponding to each patch. This process resulted in mini-batches consisting of 96 samples. We used input time step embeddings for t through sinusoidal positional encoding (Vaswani et al. 2017) and provided these embeddings as inputs to each residual block in the local and global branches, allowing the model to share parameters across time. During sampling, the patch size R is set to 64 and the step size r is set to 8 for covering the whole image. The objective function includes the following two items: the diffusive objective function Ldiff and the global loss function Lglobal. Total loss Ltotal = Ldiff + λLglobal, where λ is a hyper-parameter to balance the contributions of the two losses. We empirically set it to 1.