reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

CameraCtrl: Enabling Camera Control for Video Diffusion Models

Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo DAI, Hongsheng Li, Ceyuan Yang

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate the effectiveness of Camera Ctrl in achieving precise camera control with different video generation models, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs.
Researcher Affiliation	Academia	Hao He1,2 Yinghao Xu3 Yuwei Guo1,2 Gordon Wetzstein3 Bo Dai2 Hongsheng Li1 Ceyuan Yang2 1 The Chinese University of Hong Kong 2 Shanghai Artificial Intelligence Laboratory 3 Stanford University
Pseudocode	No	The paper describes methods and procedures using figures and textual descriptions, but it does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The project website is at: https://hehao13.github.io/projects-Camera Ctrl/.
Open Datasets	Yes	We choose three datasets as the candidates: Objaverse (Deitke et al., 2023), MVImage Net (Yu et al., 2023), and Real Estate10K (Zhou et al., 2018). ... Web Vid10M (Bain et al., 2021), which is used to train the base video diffusion model.
Dataset Splits	Yes	We choose Real Estate10K as the dataset, with around 65K video clips for training. ... For the Rot Err, and Trans Err, we have randomly chosen 1,000 videos and the corresponding camera poses from the Real Estate10K test set.
Hardware Specification	Yes	We used 16 80G NVIDIA A100 GPUS to train the Camera Ctrl models with a batch size of 2 per GPUS for 50K steps, taking about 25 hours. ... We used 32 80G NVIDIA A100 GPUS to train the models with a batch size of 1 per GPUS for 50K steps, taking about 40 hours.
Software Dependencies	No	We use the LAVIS (Li et al., 2023) to generate the text prompts for each video clip of the used dataset... The paper mentions specific optimizers and noise schedulers but does not provide version numbers for any software libraries, programming languages, or development environments used.
Experiment Setup	Yes	We use the Adam W optimizer to train our model with a constant learning rate of 1 10 4 (T2V) or 3 10 5 (I2V). ... with a batch size of 32 for 50K steps. ... We use a linear beta noise schedule, where βstart = 0.00085, βend = 0.012, and T = 1000.