reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

4K4DGen: Panoramic 4D Generation at 4K Resolution

Authors: Renjie Li, Panwang Pan, Bangbang Yang, Dejia Xu, Shijie Zhou, zhang xuanyang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS 4.1 EXPERIMENTAL SETTINGS 4.2 RESULTS 4.3 ABLATION STUDIES
Researcher Affiliation	Collaboration	1 Bytedance, 2 University of Texas at Austin, 3 University of California, Los Angeles, 4 Texas A&M University
Pseudocode	No	The paper describes its methodology using textual descriptions and mathematical equations (e.g., Eq. 1-6) but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code	No	Furthermore, we will make our panorama datasets and related code publicly available in the future.
Open Datasets	Yes	we evaluate our methodology using a dataset of 16 panoramas generated by text-to-panorama diffusion models (Yang et al., 2024). The static panoramas used in the dataset of the main draft are generated by a text-to-panorama diffusion model, fine-tuned from stable diffusion (Rombach et al., 2022) on SUN360. We present quantitative results on an additional 32 scenes randomly sampled from WEB360 dataset (Wang et al., 2024b).
Dataset Splits	No	The paper mentions using 16 panoramas and an additional 32 scenes randomly sampled, but does not specify any training/test/validation splits for these datasets. For evaluation, it mentions: "For the test views, we select random cameras with p = 0 as part of our testing camera set."
Hardware Specification	Yes	All experiments are executed on a single NVIDIA A100 GPU with 80 GB RAM.
Software Dependencies	No	The paper mentions specific models like "Animate-anything model (Dai et al., 2023)", "SVD model (Blattmann et al., 2023a)", and "Mi Da S (Ranftl et al., 2021; Birkl et al., 2023)" but does not provide specific version numbers for these or other ancillary software components like programming languages or libraries.
Experiment Setup	Yes	For perspective images, we uniformly select 20 directions u on the sphere S2 as the z-axis of 20 cameras. In each experiment, the image plane size s is set at 0.6 × 0.6, with a focal length f = 0.6 and a resolution of 512 × 512. For the Panoramic Animator, we set the video length L = 14, the channel number c = 9, the latent code size (h, w) = 1/8(H, W), the perspective image size pH = pW = 1/4W. The sphere is uniformly divided into 20 perspective views, each with 80 FOV. For the denoiser, the max denoising step is 25. The hyper-parameters for optimization are set as follows: λdepth = 1, λscale = 0.1, λshift = 0.01. We conduct Spatial-Temporal Geometry Alignment optimization over 3000 iterations, with λscale and λshift set to zero during the first 1500 iterations. For the 4D representation training stage, Gaussian parameters are optimized over 10000 iterations for each time stamp t. The hyper-parameters for this stage are defined as λrgb = 1, λtemporal = λsem = λgeo = 0.05, and the disturbance vector range α is varied at 0.05, 0.1, and 0.2 during the 5400, 6600, and 9000 iterations, respectively.