3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation
Authors: Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on COCO-Position and COCO-MIG benchmarks demonstrate that 3DIS significantly outperforms existing methods in both layout precision and attribute rendering. Notably, 3DIS offers seamless compatibility with diverse foundational models, providing a robust, adaptable solution for advanced multi-instance generation. |
| Researcher Affiliation | Academia | Dewei Zhou 1, Ji Xie 1, Zongxin Yang 2, Yi Yang 1 1RELER, CCAI, Zhejiang University 2DBMI, HMS, Harvard University EMAIL {Zongxin Yang}@hms.harvard.edu |
| Pseudocode | No | The paper describes the 3DIS framework and its three key components: Scene Depth Map Generation, Layout Control, and Detail Rendering, in prose and with mathematical equations (e.g., for Cross Attention and filtering), but does not include a distinct pseudocode block or algorithm section. |
| Open Source Code | Yes | The code is available at: https://github.com/limuloo/3DIS. |
| Open Datasets | Yes | We conducted extensive experiments on two benchmarks to evaluate the performance of 3DIS: (i) COCO-Position (Lin et al., 2015; Zhou et al., 2024a): Evaluated the layout accuracy and coarse-grained category attributes of the scene depth maps. (ii) COCO-MIG (Zhou et al., 2024a): Assessed the fine-grained rendering capabilities. ... In alignment with this approach, we utilized the COCO dataset (Lin et al., 2015) for training. |
| Dataset Splits | No | We utilized a training set comprising 5,878 images from the LAION-art dataset (Schuhmann et al., 2021), selecting only those with a resolution exceeding 512x512 pixels and an aesthetic score of 8.0. ... We utilized the COCO dataset (Lin et al., 2015) for training. ... For a comprehensive evaluation, each model generated 750 images across both benchmarks. The paper mentions training data size for LAION-art and evaluation images for COCO-MIG/Position, but lacks specific training/validation/test splits for either dataset. |
| Hardware Specification | No | The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | The paper mentions using LDM3D, Adam W optimizer, Stanza (Qi et al., 2020), and Grounding-DINO (Liu et al., 2023) as tools and models, but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We employed pyramid noise (Kasiopy, 2023) to fine-tune the LDM3D model for 2,000 steps, utilizing the Adam W (Kingma & Ba, 2017) optimizer with a constant learning rate of 1e 4, a weight decay of 1e 2, and a batch size of 320. |