CameraCtrl: Enabling Camera Control for Video Diffusion Models
Authors: Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo DAI, Hongsheng Li, Ceyuan Yang
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate the effectiveness of Camera Ctrl in achieving precise camera control with different video generation models, marking a step forward in the pursuit of dynamic and customized video storytelling from textual and camera pose inputs. |
| Researcher Affiliation | Academia | Hao He1,2 Yinghao Xu3 Yuwei Guo1,2 Gordon Wetzstein3 Bo Dai2 Hongsheng Li1 Ceyuan Yang2 1 The Chinese University of Hong Kong 2 Shanghai Artificial Intelligence Laboratory 3 Stanford University |
| Pseudocode | No | The paper describes methods and procedures using figures and textual descriptions, but it does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The project website is at: https://hehao13.github.io/projects-Camera Ctrl/. |
| Open Datasets | Yes | We choose three datasets as the candidates: Objaverse (Deitke et al., 2023), MVImage Net (Yu et al., 2023), and Real Estate10K (Zhou et al., 2018). ... Web Vid10M (Bain et al., 2021), which is used to train the base video diffusion model. |
| Dataset Splits | Yes | We choose Real Estate10K as the dataset, with around 65K video clips for training. ... For the Rot Err, and Trans Err, we have randomly chosen 1,000 videos and the corresponding camera poses from the Real Estate10K test set. |
| Hardware Specification | Yes | We used 16 80G NVIDIA A100 GPUS to train the Camera Ctrl models with a batch size of 2 per GPUS for 50K steps, taking about 25 hours. ... We used 32 80G NVIDIA A100 GPUS to train the models with a batch size of 1 per GPUS for 50K steps, taking about 40 hours. |
| Software Dependencies | No | We use the LAVIS (Li et al., 2023) to generate the text prompts for each video clip of the used dataset... The paper mentions specific optimizers and noise schedulers but does not provide version numbers for any software libraries, programming languages, or development environments used. |
| Experiment Setup | Yes | We use the Adam W optimizer to train our model with a constant learning rate of 1 10 4 (T2V) or 3 10 5 (I2V). ... with a batch size of 32 for 50K steps. ... We use a linear beta noise schedule, where βstart = 0.00085, βend = 0.012, and T = 1000. |