reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation

Authors: Abdelrahman Eldesokey, Peter Wonka

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that our approach can generate complicated scenes based on 3D layouts, outperforming the standard depth-conditioned T2I methods by two-folds on object generation success rate. Moreover, it outperforms all methods in comparison on preserving objects under layout changes.
Researcher Affiliation	Academia	Abdelrahman Eldesokey & Peter Wonka King Abdullah University of Science and Technology (KAUST) Thuwal, Saudi Arabia {first.last}@kaust.edu.sa
Pseudocode	No	The paper describes methods using equations and textual descriptions, but does not include any explicitly labeled "Pseudocode" or "Algorithm" blocks, nor structured code-like procedures.
Open Source Code	Yes	Project Page: https://abdo-eldesokey.github.io/build-a-scene/ ... The source code and the evaluation protocol are publicly available. 1https://github.com/abdo-eldesokey/build-a-scene
Open Datasets	Yes	We define a set of 16 objects from the MS COCO dataset (Lin et al., 2014) and their corresponding aspect ratios.
Dataset Splits	Yes	We sampled 100 random layouts and ran each layout with 5 different seeds for fairness.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. It mentions the software frameworks like Control Net and Stable Diffusion v1.5, but not the underlying hardware.
Software Dependencies	Yes	LC is based on Control Net with Stable Diffusion v1.5 Rombach et al. (2022)... We use a general object detector, YOLOv8 (Reis et al., 2023)... as an input to SAM (Kirillov et al., 2023)... monocular depth estimation model, i.e. Depth-Anything Yang et al. (2024)... we employ the Omni3D Brazil et al. (2023) detector
Experiment Setup	Yes	We perform T = 20 denoising steps in the quantitative comparison for efficiency and T = 40 for the qualitative results for better quality. ... We sampled 100 random layouts and ran each layout with 5 different seeds for fairness. ... We also experiment with varying T in Section 3.3 for blending the latents. ... By applying Equation (6) for T = 0.4T, the sofa is seamlessly integrated into the scene... When T = 0.8T, the sofa is seamlessly blended into the scene...