reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Authors: Jianhong Bai, Menghan Xia, Xintao WANG, Ziyang Yuan, Zuozhu Liu, Haoji Hu, Pengfei Wan, Di ZHANG

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that Syn Cam Master can generate consistent content from different viewpoints of the same scene, and achieves excellent inter-view synchronization. Ablation studies highlight the advantages of our key design choices. Furthermore, our method can be easily extended for novel view synthesis in videos by introducing a reference video to our multi-camera video generation model. Our contribution can be summarized as follows: ... Extensive experiments show the proposed Syn Cam Master outperforms baselines by a large margin.
Researcher Affiliation	Collaboration	1Zhejiang University, 2Kuaishou Technology, 3Tsinghua University
Pseudocode	No	The paper describes the methodology using prose, mathematical equations (e.g., Eq. 1-7), and diagrams (e.g., Figure 2 for model overview), but does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github. com/Kwai VGI/Syn Cam Master.
Open Datasets	Yes	We also release a multi-view synchronized video dataset, named Syn Cam Video-Dataset. Our code is available at https://github. com/Kwai VGI/Syn Cam Master. ... DL3DV-10K (Ling et al., 2024) ... Real Estate-10K (Zhou et al., 2018) ... Human3.6M (Ionescu et al., 2013) ... Panoptic studio (Joo et al., 2015) ... Objaverse (Deitke et al., 2023) ... Co3D (Reizenstein et al., 2021) and MVImg Net (Yu et al., 2023)
Dataset Splits	No	The paper mentions data usage probabilities during training (e.g., "We joint train our model on multi-view video data, multi-view image data, and single-view video data with the probability of 0.6, 0.2, and 0.2 respectively") and describes the construction of an evaluation set ("We construct the evaluation set with 100 manually collected text prompts, and inference with 4 viewpoints each, resulting in 400 videos in total."). However, it does not provide specific training/validation/test splits (e.g., percentages or exact counts) for any of the datasets used to reproduce experiments.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions tools and frameworks like "UNet-based explorations", "Transformer-based scaling laws", "3D Variational Auto-Encoder (VAE)", "Rectified Flow framework", "SAM (Kirillov et al., 2023)", and "Co Tracker (Karaev et al., 2023)". However, it does not specify version numbers for any of these software components or libraries.
Experiment Setup	Yes	We joint train our model on multi-view video data, multi-view image data, and single-view video data with the probability of 0.6, 0.2, and 0.2 respectively. We train the model of 50K steps at the resolution of 384x672 with a learning rate of 0.0001, batch size 32. The view-attention module is initialized with the weight of the temporal-attention module, and the camera encoder and the projector are zero-initialized.