3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Authors: Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness and versatility of 3Dit Scene in scene image editing. We have conducted evaluations of 3Dit Scene under various settings, and the results demonstrate significant improvements over baseline methods. The paper includes sections for 'QUANTITATIVE RESULTS', 'QUALITATIVE RESULTS', and 'ABLATION STUDY'. |
| Researcher Affiliation | Collaboration | The authors are affiliated with universities such as CUHK (The Chinese University of Hong Kong), Stanford (Stanford University), and UCLA (University of California, Los Angeles), as well as industry entities like Snap Inc. and Byte Dance, indicating a collaboration between academia and industry. |
| Pseudocode | No | The paper describes the method using textual descriptions and architectural diagrams (Figures 2 and 3), but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github. com/zqh0253/3Dit Scene. |
| Open Datasets | No | The paper discusses processing 'scene images' and 'real images' for editing. For comparison, it mentions 'Object 3DIT, trained on synthetic datasets'. However, it does not provide specific access information (link, DOI, formal citation) for any publicly available or open dataset used for its own experimental evaluation or user study, beyond general mentions of input images. |
| Dataset Splits | No | The paper mentions a user study conducted on '20 samples for each method' but does not provide specific train/test/validation dataset splits with percentages, sample counts, or references to predefined splits for model training or evaluation. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., GPU models, CPU types, memory) used for running the experiments or training the models. |
| Software Dependencies | Yes | To lift an image to 3D, we use Geo Wizard (Fu et al., 2024) to estimate its relative depth. Stable Diffusion (Rombach et al., 2022) s inpainting pipeline is adopted... We leverage Mobile SAM (Zhang et al., 2023a) and Open CLIP (Ilharco et al., 2021) to segment and compute rendered views feature maps. |
| Experiment Setup | Yes | We perform 1500 SDS steps to optimize the whole scene. We randomly sample the diffusion time step from [l, r], where l = 0.02, and r starts at 0.5 and gradually decreases to 0.2 by the 1000th step. We use guidance strength of 5 for classifier-free guidance. In Eq. (4), we choose λrecon = 1000, λSDS = 0.01, and λdistill = 1. |