reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SceneX: Procedural Controllable Large-Scale Scene Generation

Authors: Mengqi Zhou, Yuxi Wang, Jun Hou, Shougao Zhang, Yiwei Li, Chuanchen Luo, Junran Peng, Zhaoxiang Zhang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments The goals of our experiments are threefold: (i) to verify the capability of Scene X for generating photorealistic large-scale scenes, including nature scenes and cities, (ii) to demonstrate the effectiveness of Scene X for personalized editing, such as adding or changing,(iii) to compare different LLMs on the proposed benchmark. Benchmark Protocol Dataset. To evaluate the effectiveness of proposed Scene X, we use GPT-4 to generate high-quality 50 scene descriptions, 50 asset descriptions, and 20 asset editing descriptions. ... We use Executability Rate (ER@1) and Success Rate (SR@1) to evaluate LLMs on our Scene X. ... Ablation Study To analyze the impact of various components within the systematic template, we conduct an ablation study based on the tree plugin in PCGHub.
Researcher Affiliation	Academia	1University of Chinese Academy of Sciences 2Institute of Automation, Chinese Academy of Sciences 3State Key Laboratory of Multimodal Artificial Intelligence Systems 4New Laboratory of Pattern Recognition 5Centre for Artificial Intelligence and Robotics 6China University of Geosciences Beijing 7Shandong University 8University of Science and Technology Beijing EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the process in stages like Scene Decomposition, Terrain Generation, Objects Generation & Retrieval, and Asset Placement, and uses mathematical formulations to describe steps, but it does not present structured pseudocode or algorithm blocks with typical pseudocode formatting.
Open Source Code	Yes	Code https://zhouzq1.github.io/Scene X/
Open Datasets	No	To evaluate the effectiveness of proposed Scene X, we use GPT-4 to generate high-quality 50 scene descriptions, 50 asset descriptions, and 20 asset editing descriptions. The scene descriptions involve natural scenes and cities. Then, we feed them to our Scene X to generate corresponding models, which are used to perform quantitative and qualitative comparisons.
Dataset Splits	No	To evaluate the effectiveness of proposed Scene X, we use GPT-4 to generate high-quality 50 scene descriptions, 50 asset descriptions, and 20 asset editing descriptions. The scene descriptions involve natural scenes and cities. (This describes data generation for evaluation, not dataset splits for training/testing a model.)
Hardware Specification	Yes	The experiments are performed on a server equipped with dual Intel Xeon Processors (Skylake architecture), each with 20 cores, totaling 80 CPU cores.
Software Dependencies	No	The paper frequently mentions Blender and various LLMs (e.g., GPT-4, Llama2, Mistral, Gemma) and a CLIP model, but does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	When generating and editing the 3D scenes, we adopt the leading GPT-4 as the large language model with its public API keys. To ensure the stability of LLM s output, we set the decoding temperature as 0.