reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

IV-mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis

Authors: Shitong Shao, zikai zhou, Lichen Bai, Haoyi Xiong, Zeke Xie

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments have demonstrated that IV-Mixed Sampler achieves state-of-the-art performance on 4 benchmarks including UCF-101-FVD, MSR-VTT-FVD, Chronomagic-Bench-150/1649, and VBench. For example, the open-source Animatediff with IV-Mixed Sampler reduces the UMT-FVD score from 275.2 to 228.6, closing to 223.1 from the closed-source Pika-2.0. Our code is released at https://github.com/xie-lab-ml/IV-mixed-Sampler.
Researcher Affiliation	Collaboration	1Hong Kong University of Science and Technology (Guangzhou) 2Baidu Inc. EMAIL EMAIL
Pseudocode	Yes	1) we construct IV-mixed Sampler under a rigorous mathematical framework and demonstrate, through theoretical analysis, that it can be elegantly transformed into a standard inverse ordinary differential equation (ODE) process. For the sake of intuition, we present IV-mixed Sampler (w.r.t., IV-IV ) on Fig. 2 and its pseudo code in Appendix B.
Open Source Code	Yes	Our code is released at https://github.com/xie-lab-ml/IV-mixed-Sampler. Our project page can be found in https://klayand.github.io/IVmixed Sampler.
Open Datasets	Yes	Our experiments have demonstrated that IV-Mixed Sampler achieves state-of-the-art performance on 4 benchmarks including UCF-101-FVD, MSR-VTT-FVD, Chronomagic-Bench-150/1649, and VBench. For example, the open-source Animatediff with IV-Mixed Sampler reduces the UMT-FVD score from 275.2 to 228.6, closing to 223.1 from the closed-source Pika-2.0. Our code is released at https://github.com/xie-lab-ml/IV-mixed-Sampler.
Dataset Splits	Yes	For our evaluation, we utilize all 497 validation videos. To ensure evaluation stability, we synthesize a total of 1,491 videos based on prompts from these validation videos, with each prompt producing 3 different videos. Specifically, we synthesize 5 videos for each of the 101 prompts provided by Ge et al. (2023), resulting in a total of 505 synthesized videos. We then compute the FVD between these 505 synthesized videos and 505 randomly sampled videos from the UCF-101 dataset (5 per class), using the built-in FVD evaluation code from Open-Sora-Plan2.
Hardware Specification	Yes	In the practical implementation, the computational overhead went up from 21s to 92s at a single RTX 4090 GPU.
Software Dependencies	No	The paper mentions 'PyTorch-like style' for pseudocode, but does not provide specific version numbers for PyTorch or other software libraries.
Experiment Setup	Yes	For all comparison experiments, we used the form IV-IV and perform IV-mixed Sampler at all time steps of the standard DDIM sampling. In addition, γt=0 go , γt=0 back, γt=1 go and γt=1 back all are set as 4. For both Animatediff and Model Scope-T2V, we use stable diffusion (SD) V1.5 as the IDM. Note that we experimented with using Mini SD as the IDM for Model Scope-T2V to maintain a consistent resolution of 256 256. However, as illustrated in Table 6, we found that its performance was inferior to using SD V1.5 with upsampling and downsampling. For Video Crafter V2, we use Realistic Vision V6.0 B1 (Mage.Space, 2023) as the IDM to accommodate a resolution of 512 320. For the remaining configurations, we follow the sampling form recommended by the corresponding VDMs. Furthermore, we find that applying IV-IV at every step on Video Crafter V2 destroys temporal coherence. Therefore, we replace IV-IV with VV-VV for z%. The results of the ablation experiments are shown in Table 7. We finally chose z%=66.7% as the final solution.