StochSync: Stochastic Diffusion Synchronization for Image Generation in Arbitrary Spaces
Authors: Kyeongmin Yeo, Jaihoon Kim, Minhyuk Sung
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that Stoch Sync provides the best performance in 360 panorama generation (where image conditioning is not given), outperforming previous finetuning-based methods, and also delivers comparable results in 3D mesh texturing (where depth conditioning is provided) with previous methods. Project page is at https: //stochsync.github.io/. In the experiments, we test Stoch Sync on two applications: 360 panoramic image generation and mesh texture generation. |
| Researcher Affiliation | Academia | Kyeongmin Yeo Jaihoon Kim Minhyuk Sung KAIST EMAIL |
| Pseudocode | Yes | Algorithm 1: Diffusion Reverse Process Algorithm 2: Diffusion Synchronization (DS) Algorithm 3: Score Distillation Sampling (SDS) Algorithm 4: Stoch Sync |
| Open Source Code | Yes | Project page is at https: //stochsync.github.io/. We will also release our code publicly. |
| Open Datasets | Yes | Diffusion models pretrained on billions of images (Rombach et al., 2022; Midjourney) have demonstrated remarkable capabilities in various zero-shot applications. Stable Diffusion 2.1 Base as the pretrained diffusion model for all methods LAION-5B dataset (Schuhmann et al., 2022) |
| Dataset Splits | No | The paper focuses on a zero-shot method using pretrained models. It describes evaluation using specific prompt sets (121 out-of-distribution prompts from Pan Fusion, 20 Chat GPT-generated prompts from L-MAGIC) and sampling 10 perspective view images from each panorama for evaluation metrics. It does not provide explicit training/test/validation splits for any dataset used in its own experiments. |
| Hardware Specification | Yes | Figure 12: Runtime comparison of NVIDIA RTX A6000 and Intel Gaudi-v2 across three different timestep settings in multi-step x0|t computation. Our evaluation indicates that the Gaudi-v2 achieves runtimes comparable to those of the A6000. |
| Software Dependencies | No | Stoch Sync uses the Stable Diffusion 2.1 Base (Rombach et al., 2022) for 360 panorama generation and the depth-conditioned Control Net (Zhang et al., 2023) for 3D mesh texturing, both of which are publicly available. While specific models are named, version numbers for general software dependencies like Python, PyTorch, or CUDA are not provided. |
| Experiment Setup | Yes | We set the resolution of the perspective view images to 512 512, and the panorama to 2,048 4,096. A linearly decreasing timestep schedule is employed, starting from T = 900 and decreasing to Tstop = 270, with a total of 25 denoising steps. For multi-step x0|t computation, the total number of steps is initially set to 50, decreasing linearly as the denoising process progresses. For view sampling, we alternate between two sets containing five views each, with azimuth angles of [0 , 72 , 144 , 216 , 288 ] and [36 , 108 , 180 , 252 , 324 ]. The elevation angle is set to 0 , and the field of view (Fo V) is set to 72 . |