CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation

Authors: Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Philipp Henzler, Konrad Schindler, Federico Tombari

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section details our experimental setup, followed by quantitative and qualitative evaluations. We compare the performance of Cube Diff against the state-of-the-art and ablate our design choices.
Researcher Affiliation Collaboration Nikolai Kalischek ETH Z urich, Google Michael Oechsle Google Fabian Manhardt Google Philipp Henzler Google Konrad Schindler ETH Z urich Federico Tombari Google
Pseudocode No The paper describes the model architecture and training pipeline but does not include any explicitly labeled pseudocode or algorithm blocks. Figure 2 provides a pipeline overview diagram.
Open Source Code Yes Project page: https://cubediff.github.io/
Open Datasets Yes We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets.
Dataset Splits Yes Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. Testing. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper mentions methods like Adam, v-prediction, DDIM sampling, and models such as Stable Diffusion and Gemini, but it does not specify version numbers for any software libraries or frameworks used in the implementation.
Experiment Setup Yes We finetune our model using Adam (Kingma & Ba, 2014) and train for 30,000 iterations with batch size 64. The learning rate is ramped up to 8 × 10−5 in the first 10,000 steps. During training, we employ classifier-free guidance, dropping conditional signals 10% of the time. We employ DDIM sampling (Song et al., 2020) with 50 steps during inference.