reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

3DIS: Depth-Driven Decoupled Image Synthesis for Universal Multi-Instance Generation

Authors: Dewei Zhou, Ji Xie, Zongxin Yang, Yi Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on COCO-Position and COCO-MIG benchmarks demonstrate that 3DIS signiﬁcantly outperforms existing methods in both layout precision and attribute rendering. Notably, 3DIS offers seamless compatibility with diverse foundational models, providing a robust, adaptable solution for advanced multi-instance generation.
Researcher Affiliation	Academia	Dewei Zhou 1, Ji Xie 1, Zongxin Yang 2, Yi Yang 1 1RELER, CCAI, Zhejiang University 2DBMI, HMS, Harvard University EMAIL {Zongxin Yang}@hms.harvard.edu
Pseudocode	No	The paper describes the 3DIS framework and its three key components: Scene Depth Map Generation, Layout Control, and Detail Rendering, in prose and with mathematical equations (e.g., for Cross Attention and filtering), but does not include a distinct pseudocode block or algorithm section.
Open Source Code	Yes	The code is available at: https://github.com/limuloo/3DIS.
Open Datasets	Yes	We conducted extensive experiments on two benchmarks to evaluate the performance of 3DIS: (i) COCO-Position (Lin et al., 2015; Zhou et al., 2024a): Evaluated the layout accuracy and coarse-grained category attributes of the scene depth maps. (ii) COCO-MIG (Zhou et al., 2024a): Assessed the ﬁne-grained rendering capabilities. ... In alignment with this approach, we utilized the COCO dataset (Lin et al., 2015) for training.
Dataset Splits	No	We utilized a training set comprising 5,878 images from the LAION-art dataset (Schuhmann et al., 2021), selecting only those with a resolution exceeding 512x512 pixels and an aesthetic score of 8.0. ... We utilized the COCO dataset (Lin et al., 2015) for training. ... For a comprehensive evaluation, each model generated 750 images across both benchmarks. The paper mentions training data size for LAION-art and evaluation images for COCO-MIG/Position, but lacks specific training/validation/test splits for either dataset.
Hardware Specification	No	The paper does not explicitly provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions using LDM3D, Adam W optimizer, Stanza (Qi et al., 2020), and Grounding-DINO (Liu et al., 2023) as tools and models, but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We employed pyramid noise (Kasiopy, 2023) to ﬁne-tune the LDM3D model for 2,000 steps, utilizing the Adam W (Kingma & Ba, 2017) optimizer with a constant learning rate of 1e 4, a weight decay of 1e 2, and a batch size of 320.