CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Authors: Nikolai Kalischek, Michael Oechsle, Fabian Manhardt, Philipp Henzler, Konrad Schindler, Federico Tombari
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This section details our experimental setup, followed by quantitative and qualitative evaluations. We compare the performance of Cube Diff against the state-of-the-art and ablate our design choices. |
| Researcher Affiliation | Collaboration | Nikolai Kalischek ETH Z urich, Google Michael Oechsle Google Fabian Manhardt Google Philipp Henzler Google Konrad Schindler ETH Z urich Federico Tombari Google |
| Pseudocode | No | The paper describes the model architecture and training pipeline but does not include any explicitly labeled pseudocode or algorithm blocks. Figure 2 provides a pipeline overview diagram. |
| Open Source Code | Yes | Project page: https://cubediff.github.io/ |
| Open Datasets | Yes | We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets. |
| Dataset Splits | Yes | Training. We train on a mixture of indoor and outdoor environments by combining multiple publicly available sources, including Polyhaven (polyhaven.com, accessed 09/2024), Humus (Persson, accessed 09/2024), Structured3D (Zheng et al., 2020) and Pano360 Kocabas et al. (2021), giving in total around 48000 panoramas for training. Testing. We evaluate our method on the common Laval Indoor (Gardner et al., 2017) and Sun360 (Xiao et al., 2018) datasets. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions methods like Adam, v-prediction, DDIM sampling, and models such as Stable Diffusion and Gemini, but it does not specify version numbers for any software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | We finetune our model using Adam (Kingma & Ba, 2014) and train for 30,000 iterations with batch size 64. The learning rate is ramped up to 8 × 10−5 in the first 10,000 steps. During training, we employ classifier-free guidance, dropping conditional signals 10% of the time. We employ DDIM sampling (Song et al., 2020) with 50 steps during inference. |