BloomScene: Lightweight Structured 3D Gaussian Splatting for Crossmodal Scene Generation

Authors: Xiaolu Hou, Mingcheng Li, Dingkang Yang, Jiawei Chen, Ziyun Qian, Xiao Zhao, Yue Jiang, Jinjie Wei, Qingyao Xu, Lihua Zhang

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Comprehensive experiments across multiple scenes demonstrate the significant potential and advantages of our framework compared with several baselines. Comprehensive qualitative and quantitative experiments across multiple scenarios show that the proposed framework has significant advantages over several baselines. Table 1: Performance comparison among Bloom Scene and baselines. Our approach achieves the best results. Table 2: Ablation results of different components.
Researcher Affiliation Academia 1 Academy for Engineering and Technology, Fudan University 2 Institute of Metaverse & Intelligent Medicine, Fudan University 3 Engineering Research Center of AI and Robotics, Ministry of Education 4 Jilin Provincial Key Laboratory of Intelligence Science and Engineering 5 Artificial Intelligence and Unmanned Systems Engineering Research Center of Jilin Province EMAIL, EMAIL
Pseudocode No The paper describes methodologies using mathematical formulas and descriptive text, such as the equations for Lpixel_DPR, Ldist_DPR, Lsmooth_DPR, LDRP, and LSCC, and a workflow diagram in Figure 1, but it does not include a clearly labeled pseudocode or algorithm block.
Open Source Code Yes Code https://github.com/Sparkling H/Bloom Scene
Open Datasets No Previous reference-based metrics (e.g., PSNR and LPIPS (Zhang et al. 2018)) are not suitable for this generation task due to the lack of 3D scenes related to text prompts as reference. The paper mentions using text prompts (e.g., "A living room with a lit furnace...") for generation and evaluation, but these are descriptions rather than a publicly available dataset with concrete access information that the authors used as input for their experiments.
Dataset Splits No The paper focuses on generative scene creation from text/image inputs and evaluates using reference-free metrics and a list of 9 text prompts. It does not mention using a traditional dataset with specific training, validation, or testing splits for its experimental setup.
Hardware Specification Yes All experiments are done on a single NVIDIA A800 GPU.
Software Dependencies No The paper mentions several pre-trained models used: Stable Diffusion v1.5 (Rombach et al. 2022), LLaVa (Contributors 2023), Stable Diffusion v1.5 Inpainting model (Rombach et al. 2022), and Zoe Depth (Bhat et al. 2023). While model versions like "v1.5" are specified, the paper does not provide specific version numbers for underlying software libraries or programming languages (e.g., Python, PyTorch, CUDA versions) that would be needed for replication.
Experiment Setup Yes To generate 3D scenes, we move the camera with a rotation of 0.63 radians. The parameters λ1, λ2, and λ3 are set to 0.7, 0.1 and 1.0, respectively. The parameters λ4 and λ5 are set to to 1e 2 and 2e 3. All experimental results are averaged over multiple experiments using five different random seeds.