reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Follow-Your-Click: Open-domain Regional Image Animation via Motion Prompts

Authors: Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Leqi Shen, Chenyang Qi, Jixuan Ying, Chengfei Cai, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments compared with 7 baselines, including both commercial tools and research methods on 8 metrics, suggest the superiority of our approach. We conducted extensive experiments and user studies to evaluate our approach, which shows our method achieves state-of-the-art performance. In this section, we introduce our detailed implementation in Sec. 4.1. Then we evaluate our approach with various baselines to comprehensively evaluate our performance in Sec. 4.2. We then ablate our key components to show their effectiveness in Sec. 4.3.
Researcher Affiliation	Collaboration	1The Hong Kong University of Science and Technology, Hong Kong 2Tencent, Hunyuan, China 3Tsinghua University, China
Pseudocode	No	The paper describes steps in regular paragraph text within sections like '3 Follow-Your-Click' and its subsections, but it does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code https://follow-your-click.github.io/
Open Datasets	Yes	We train our model for 60k steps on the Web Vid-10M (Bain et al. 2021) and then finetune it for 30k steps on the reconstructed Web Vid-Motion dataset. Training on public datasets such as Web Vid (Bain et al. 2021) and HDVILA (Xue et al. 2022) directly is challenging...
Dataset Splits	No	The paper mentions using Web Vid-10M, Web Vid-Motion, UCF-101, and MSRVTT datasets but does not explicitly provide details about specific training, validation, or test splits (e.g., percentages or sample counts) used for these datasets in their experiments, nor does it refer to standard splits for their specific evaluation setup.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used to run its experiments. It only mentions the software modules used: 'In our experiments, the spatial modules are based on Stable Diffusion (SD) V1.5 (Rombach et al. 2022), and motion modules use the corresponding Animate Diff (Guo et al. 2023) checkpoint V2.'
Software Dependencies	Yes	In our experiments, the spatial modules are based on Stable Diffusion (SD) V1.5 (Rombach et al. 2022), and motion modules use the corresponding Animate Diff (Guo et al. 2023) checkpoint V2.
Experiment Setup	Yes	We train our model for 60k steps on the Web Vid-10M (Bain et al. 2021) and then finetune it for 30k steps on the reconstructed Web Vid-Motion dataset. We measure these metrics at the resolution of 256 x 256 with 16 frames. In Sec. 4.3, we conduct a detailed analysis of the selection of mask ratio.