reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

VidEvo: Evolving Video Editing through Exhaustive Temporal Modeling

Authors: Sizhe Dang, Huan Liu, Mengmeng Wang, Xin Lai, Guang Dai, Jingdong Wang

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental evaluations show that Vid Evo enhances frame-to-frame temporal consistency. Ablation studies confirm NVE and WFA s effectiveness and their plug-and-play capability with other methods. In this section, we present quantitative and qualitative analyses, ablation studies, and orthogonality analyses. Our method is primarily evaluated on the DAVIS [Pont-Tuset et al., 2017] dataset for comparison with existing works.
Researcher Affiliation	Collaboration	1Xi an Jiaotong University 2Zhejiang University of Technology 3SGIT AI Lab, State Grid Corporation of China 4Baidu Inc.
Pseudocode	Yes	Algorithm 1 Vid Evo video editing
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide any links to a code repository. There is no mention of code being included in supplementary materials.
Open Datasets	Yes	Our method is primarily evaluated on the DAVIS [Pont-Tuset et al., 2017] dataset for comparison with existing works.
Dataset Splits	No	The paper mentions using the DAVIS dataset but does not specify any particular training, validation, or test splits. It implies standard usage for comparison but provides no details on how the data was partitioned.
Hardware Specification	No	The paper discusses memory usage and runtime in Table 1 and Section 4.4, providing values like "10.4GB for pipeline memory and 19GB for tuning" or "17.2GB for pipeline memory alongside 6.6GB for NVE tuning." However, it does not specify any concrete hardware components such as specific GPU or CPU models used for these measurements or experiments.
Software Dependencies	No	The paper mentions general models like "Stable Diffusion Model" and "CLIP model" and a framework "P2P-based editing methods," but it does not specify any programming languages, libraries, or solvers with their respective version numbers that would be required for reproducibility.
Experiment Setup	Yes	Employing DDIM inversion with a default guidance scale of w = 7.5, our objective at each time step t is to minimize the following: \(z_{t-1} \leftarrow z_{t-1}(z^I_t, \phi_t, C)\) ... To address these issues, we propose the WFA mechanism as an alternative to traditional self-attention. As shown in Fig. 4, this mechanism uses a window size of \(\lambda\) (e.g., 3) to allow each token... Our ablation studies reveal that for videos with minimal motion, a window size of 3 for our WFA effectively maintains temporal consistency and achieves robust results without significant computational overhead.