EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Authors: Kaizhi Zheng, Xiaotong Chen, Xuehai He, Jing Gu, Linjie Li, Zhengyuan Yang, Kevin Lin, Jianfeng Wang, Lijuan Wang, Xin Wang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the experiments, we quantitatively assess the performance of Edit Room in scenarios with single-operation commands. From the results, we find that Edit Room outperforms other baselines in all metrics across different room types and editing types, which indicates higher precision and coherence in single-operation editing. Furthermore, we qualitatively evaluate Edit Room in scenarios involving multi-operation commands. We find that the model can successfully generalize to these scenarios even though we do not train the model on multi-operation data. |
| Researcher Affiliation | Collaboration | Kaizhi Zheng1 Xiaotong Chen2 Xuehai He1 Jing Gu1 Linjie Li3 Zhengyuan Yang3 Kevin Lin3 Jianfeng Wang3 Lijuan Wang3 Xin Eric Wang1 1UC Santa Cruz 2University of Michigan, Ann Arbor 3Microsoft |
| Pseudocode | No | The paper describes the EditRoom method with two primary modules (Command Parameterizer and Scene Editor) and illustrates its architecture in Figure 1 and Figure 2, detailing the components and their interactions. However, it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' block with structured, step-by-step procedures. |
| Open Source Code | No | Project website: https://eric-ai-lab.github.io/edit-room.github.io/. The paper mentions a project website but does not contain an unambiguous statement that the source code for the methodology is released, nor does the provided URL directly link to a code repository. |
| Open Datasets | Yes | The dataset is generated through an automated data augmentation pipeline that produces editing pairs based on object-level modifications applied to scenes from the 3D-FRONT dataset (Fu et al., 2021a). We utilize scenes from the bedroom, dining room, and living room categories and enhance them with high-quality object models from the 3D-FUTURE dataset (Fu et al., 2021c) to simulate real-world editing workflows. |
| Dataset Splits | Yes | The resulting dataset consists of approximately 83,000 training samples and 7,800 test samples (randomly sampled across editing types for each room type). Table 1 provides a detailed breakdown of the dataset statistics across the three scene categories. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or other detailed computing specifications used for running the experiments. It only mentions the training parameters. |
| Software Dependencies | No | The paper mentions several software components and models used, such as 'GPT-4o', 'LLAVA-1.6', 'CLIP-Vi T-B32 text encoder', 'Open Shape', 'VQ-VAE model', and 'Sentence BERT (S-BERT)'. However, it does not provide specific version numbers for any of these software dependencies or libraries, which is required for reproducibility. |
| Experiment Setup | Yes | Training is conducted using the Adam W optimizer over 300 epochs, with a batch size of 512 and a learning rate of 2 10 4. All models are individually trained and tested on each room type. |