Re-Attentional Controllable Video Diffusion Editing

Authors: Yuanzhi Wang, Yong Li, Mengyi Liu, Xiaoya Zhang, Xin Liu, Zhen Cui, Antoni B. Chan

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results verify that Re At Co consistently improves the controllability of video diffusion editing and achieves superior video editing performance. We perform extensive experiments and achieve superior or comparable results, demonstrating that our Re At Co mitigates the limitations of existing state-of-the-arts, such as mislocated objects, incorrect number of objects.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China 2Department of Content Security, Kuaishou Technology, Beijing, China 3Department of Computer Science, City University of Hong Kong, Hong Kong, China 4Seeta Cloud, Nanjing, China
Pseudocode No The paper describes the methodology using mathematical equations and textual explanations, but it does not include any explicitly labeled pseudocode blocks or algorithm listings.
Open Source Code Yes Code https://github.com/mdswyz/Re At Co
Open Datasets Yes We conduct experiments on the text-guided video editing dataset LOVEU-TGVE-2023 (Wu et al. 2023b), the video samples used in (Chai et al. 2023), and the video samples from (Videvo 2024). Each video has 4 different edited prompts for evaluation. (Videvo 2024). Free stock video footage. https://www.videvo.net/. Accessed: 2024-12-23.
Dataset Splits No The paper mentions conducting experiments on specific datasets and states "Each video has 4 different edited prompts for evaluation," but it does not specify any training, validation, or test dataset splits or percentages for the models used in the experiments.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes For specifying the object regions, we consider enabling the user to provide it in the possibly simplest way, i.e., bounding boxes. We consider three standard evaluation metrics that are proposed in the (Wu et al. 2023b) to measure the quality of edited videos. In the experiments, K is set as 20% of the number of the mask regions so that K is adaptively set according to the size of the mask.