reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

EG4D: Explicit Generation of 4D Object without Score Distillation

Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The qualitative and quantitative evaluations demonstrate that our framework outperforms the baselines in generation quality by a considerable margin. The qualitative results, quantitative evaluations and user preferences validate that our EG4D outperforms SDS-based baselines by a large margin, producing 4D content with realistic 3D appearance, high image fidelity, and fine temporal consistency. Extensive ablation studies also showcase our effective solutions to the challenges in reconstructing 4D representation with synthesized videos. Section 5: EXPERIMENTS, Section 5.1: EXPERIMENTAL SETTINGS, Section 5.2: RESULTS, Section 5.3: ABLATION STUDIES.
Researcher Affiliation	Academia	1University of Science and Technology of China 2City University of Hong Kong 3Cornell University
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks. The methods are described using mathematical equations and descriptive text.
Open Source Code	Yes	Code available: github.com/jasongzy/EG4D
Open Datasets	Yes	The panel (a) uses video rendered from Objaverse (Deitke et al., 2023) dataset, a large-scale 3D dataset that also contains some animation models. Figure 15 (b) shows the 4D generation results from in-the-wild videos from the Consistent4D benchmark; In panel (c), we leverage the pose-conditioned character video generation model, Animate Anyone (Hu, 2024), as our video model in our framework.
Dataset Splits	No	The paper describes using input images and SVD-generated videos, as well as data from Objaverse and Consistent4D benchmarks. However, it does not specify any training, validation, or test dataset splits for the experiments conducted in this paper.
Hardware Specification	Yes	Our implementation is primarily based on the Py Torch framework and tested on a single NVIDIA RTX 3090 GPU.
Software Dependencies	No	The paper mentions 'Py Torch framework' and 'SDXL-Turbo (Sauer et al., 2023b)', but does not provide specific version numbers for PyTorch or other key software components.
Experiment Setup	Yes	In Stage I, we use SVD-img2vid-xl (Blattmann et al., 2023a) to generate 25-frame videos. For multi-view generation, we employ SV3Dp conditioned on a camera pose sequence, i.e., 21 azimuth angles (360 evenly divided) and a fixed 0 elevation. All images are set to a resolution of 576 × 576. In Stage III, we use SDXL-Turbo (Sauer et al., 2023b) with small strength (0.167) to provide the diffusion prior. In the semantic refinement stage (Stage III), we fine-tune 4DGS for 5k steps with Adam optimizer. The initial learning rate is set to 1e-4 with exponential decay. The weight λ in diffusion refinement loss is set to 0.5.