I2VControl-Camera: Precise Video Camera Control with Adjustable Motion Strength

Authors: Wanquan Feng, Jiawei Liu, Pengqi Tu, Tianhao Qi, Mingzhen Sun, Tianxiang Ma, Songtao Zhao, SiYu Zhou, Qian HE

ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on static and dynamic scenes show that our framework outperformances previous methods both quantitatively and qualitatively.
Researcher Affiliation Collaboration 1Byte Dance China 2University of Science and Technology of China (USTC) 3Institute of Automation, Chinese Academy of Sciences (CASIA)
Pseudocode Yes Algorithm 1: Static and dynamic region extraction based on trajectory analysis
Open Source Code Yes Project page: https://wanquanf.github.io/I2VControl Camera.
Open Datasets Yes Although previous methods trained on the Real Estate10K (Zhou et al., 2018) dataset for training... To compute FID, we randomly select 2000 video frames from Web Vid (Bain et al., 2021).
Dataset Splits Yes we collect a dataset of 30K video clips as our training set... The first testing set comprises 500 random static scene clips from Real Estate10K... The second testing set consists of 480 samples generated by text-to-image model...
Hardware Specification Yes We use 16 NVIDIA A100 GPUs to train them with a batch size 1 per GPU for 20K steps, taking about 36 hours.
Software Dependencies No The paper mentions using "Imageto-Video version of Magicvideo-V2" as a base model and "RAFT optical flow model" but does not provide specific version numbers for these or other software libraries/dependencies.
Experiment Setup Yes We set the frame number as 24 and the resolution as 704 448. We use 16 NVIDIA A100 GPUs to train them with a batch size 1 per GPU for 20K steps... During training, we fix the parameters of the base model and only train our adapter part. We adopt the same loss function and the same scheduler, with the sole modification being the introduction of the control signal condition.