Spherical-Nested Diffusion Model for Panoramic Image Outpainting
Authors: Xiancheng Sun, Senmao Ma, Shengxi Li, Mai Xu, Jingyuan Xia, Lai Jiang, Xin Deng, Jiali Wang
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results have verified the significantly superior performances of our Sp ND model. Our contributions are three-fold: We seamlessly incorporate the spherical noise by investigating ERP distortion when training diffusion models, such that the structural prior of panoramic images can be well accommodated. We propose the SDC layer that is the first successful attempt to satisfy the intrinsic sphere nature within generative model architectures, with adaptive and consistent receptive fields. We develop the Sp ND model by incorporating the spherical noise and SDC layer as fundamental modules, accomplished by the CME to ensure the high-quality panoramic image outpainting. |
| Researcher Affiliation | Collaboration | 1Department of Electronic Information Engineering, Beihang University, Beijing, China 2State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China 3Department of Electronic Science and Technology, National University of Defense Technology, Changsha, China, Location, Country 4Cainiao Technology, Hangzhou, China. Correspondence to: Shengxi Li <Li EMAIL>. |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual descriptions, for example, in Section 3, 'Methodology', but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Codes are publicly available at https://github.com/ chronos123/Sp ND. |
| Open Datasets | Yes | Datasets. To evaluate the performance of our Sp ND model, we employed the widely applied Matterport3D (Chang et al., 2017) and Structured3D (Zheng et al., 2020) dataset for comparison. |
| Dataset Splits | Yes | Similar to (Lin et al., 2019), we obtained 10912 panoramic images with size 1024 512 for the Matterport3D dataset. A total of 9, 820 images were selected for the training, and all 1, 0912 images were used for evaluation to compute the sufficient statistics. For the Structured3D dataset, we followed the methodology outlined in (Wu et al., 2024b) to obtain 21,133 images, of which 19,019 images were used for training and all 21,133 images were used for evaluation to compute the sufficient statistics. |
| Hardware Specification | No | The paper mentions using a 'pre-trained diffusion model' and 'pre-trained VAE model' but does not specify any particular hardware (e.g., GPU models, CPU types) used for conducting its own experiments or training the Spherical-Nested Diffusion Model. |
| Software Dependencies | No | The paper mentions using the 'AdamW optimizer' and 'DDIM' for training and inference, respectively. It also states that 'Our Sp ND model was trained based on the pre-trained weights of (Zhang et al., 2023)'. However, it does not explicitly list specific software dependencies with version numbers, such as Python, PyTorch, TensorFlow, or CUDA versions. |
| Experiment Setup | Yes | Implementation Details. Our Sp ND model was trained based on the pre-trained weights of (Zhang et al., 2023). The hyperparameter ζ was set to 30 to obtain the structural prior for the ERP format. During training, we used the Adam W optimizer (Loshchilov & Hutter, 2019) with a learning rate of 10 5 and a batch size of 4. During inference, the classifier-free guidance scale (Ho & Salimans, 2021) was set to 3.0 with a DDIM (Song et al., 2021) step of 50. Additionally, the hyperparameter σ in SDC layers was set to 0.04 to constrain the learnable offset. |