Mixed-View Panorama Synthesis using Geospatially Guided Diffusion
Authors: Zhexiao Xiong, Xin Xing, Scott Workman, Subash Khanal, Nathan Jacobs
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of our proposed method. In particular, our model can handle scenarios when the available panoramas are sparse or far from the location of the panorama we are attempting to synthesize. We evaluate our approach for mixed-view panorama synthesis quantitatively and qualitatively through various experiments. |
| Researcher Affiliation | Collaboration | Zhexiao Xiong EMAIL Department of Computer Science & Engineering, Washington University in St. Louis Xin Xing EMAIL Department of Computer Science, University of Nebraska Omaha Scott Workman EMAIL Subash Khanal EMAIL Department of Computer Science & Engineering, Washington University in St. Louis Nathan Jacobs EMAIL Department of Computer Science & Engineering, Washington University in St. Louis |
| Pseudocode | No | The paper describes the methodology in prose and mathematical equations (e.g., Equation 1, 2, 3, 4, 5) but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The project page is available at https://mixed-view.github.io. This is a project page and does not explicitly state that the source code for the methodology is provided there, nor is it a direct link to a code repository. |
| Open Datasets | Yes | We train and evaluate our methods using the Brooklyn and Queens dataset Workman et al. (2017). This dataset contains non-overlapping satellite images (approx. 30 cm resolution) and street-level panoramas from New York City collected from Google Street View. |
| Dataset Splits | Yes | For evaluation on the Brooklyn subset, we use the original train/test split, resulting in 38,744 images for training, 500 images for validation, and 4361 images for testing. For crossdomain evaluation on Queens, we randomly select 1000 images from the Queens subset and report the performance on the selected images. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | Yes | We use a pretrained Stable Diffusion Rombach et al. (2022) model (v1.5) with the default parameters. |
| Experiment Setup | Yes | For optimization, we use Adam W with a learning rate of λ = 2 10 5. The input images are resized to 256 1024 as local conditions. Specifically, the satellite image is resized to 256 256 and replicated horizontally four times. We use DDIM Song et al. (2020) for sampling with the number of time steps set to 50. The classifier free guidance Ho & Salimans (2022) is set to 7.5. During training, we set the rate to keep/drop all conditions as 0.3 and 0.1 respectively, and set the dropout rate of each condition to 0.1. For the text prompts, we randomly replace 50% of text prompts with empty strings to enhance the model s ability to learn image geometric relationships. |