TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation

Authors: Xingrui Wang, Xin Li, Yaosi Hu, Hanxin Zhu, Chen Hou, Cuiling Lan, Zhibo Chen

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments on two categories of existing datasets, and the model performs well under various control conditions. Extensive results have demonstrated that our proposed method can achieve state-of-the-art performance.
Researcher Affiliation Academia 1University of Science and Technology of China 2The Hong Kong Polytechnic University EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods and equations but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that source code for the methodology is released, nor does it provide a link to a code repository.
Open Datasets Yes MAGE (Hu, Luo, and Chen 2022) introduces five datasets for evaluating this task, comprising three MNIST datasets and two CATER datasets. MNIST datasets. Single Moving MNIST contains a single digit, whereas Double Moving MNIST (Mittal, Marwah, and Balasubramanian 2017) features pairs of digits moving in various directions: top to bottom, bottom to top, left to right, and right to left. CATER datasets. MAGE (Hu, Luo, and Chen 2022) provides two datasets, namely CATER-GEN-v1 and CATER-GEN-v2, which remain unaltered in this paper.
Dataset Splits No The paper mentions using various datasets and replicating results from prior work, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not specify any software dependencies or their version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup No While the paper mentions video resolutions (e.g., 'The video resolution is 64 64 pixels.' and 'resized to 128 128'), and states 'During training, the parameters of Enc remain fixed', it does not provide specific hyperparameters such as learning rate, batch size, number of epochs, or optimizer settings necessary for a reproducible experimental setup.